CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
542 stars 120 forks source link

convokit.download is broken when downloading non-Corpus objects #216

Open jpwchang opened 4 months ago

jpwchang commented 4 months ago

Theoretically, convokit.download supports downloading items other than corpora. For instance, the "official" way to obtain the trained motifs for the Parliament dataset (for use in, say, reproducing QuestionTypology results) is to run convokit.download('parliament-motifs'). If you actually try to do this, however, you will get an error, becauseconvokit.download` tries to load an index.json for the target object (which exists for corpora, but not for non-Corpus objects).

Steps to reproduce

Simply try the following:

import convokit
convokit.download("parliament-motifs")

And you will see that the download completes successfully but then the function errors out because it attempts to load index.json which is nonexistent

Additional information

I plan to address this issue in my upcoming pull request for the Forecaster rewrite, since this bug must be fixed in order to successfully support downloading pretrained Forecaster models.