CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
556 stars 129 forks source link

Is there a way to change the cache directory? #75

Closed abisee closed 4 years ago

abisee commented 4 years ago

I've tried searching the docs but was unable to find it. Is there something I can do to set the download cache to be something other than ~/.convokit? Thanks!

calebchiam commented 4 years ago

Hi @abisee, you can set the data_dir parameter in download() to configure the output location. (Check out this part of the docs for more details.)

abisee commented 4 years ago

Great, thanks!

Is there a way to set it once (e.g. setting an environment variable), so I don't need to specify data_dir every time I use the download() function?

calebchiam commented 4 years ago

Unfortunately not, though note that if you've downloaded a Corpus example-corpus for the first time to a specified data_dir (aka download('example-corpus', data_dir=[loc])), future calls to download('example-corpus') will automatically use the example-corpus stored at loc.

However, if you wish to permanently change the download directory, that's not something we support. Typically, we'd imagine most users downloading corpora to a local directory and then loading corpora directly from said local directory (without using the download function). In other words, using download() is typically a one-time thing, for when you're first downloading the data to be worked with on your machine.

Would your workflow benefit significantly from allowing the default download directory to be permanently changed? We could include something like this in a future release if we see a common use case for it.