BabakHemmatian / Gay_Marriage_Corpus_Study

LDA and RNN for Reddit comments
0 stars 0 forks source link

Download Reddit comments "as-needed" #6

Closed sabjoslo closed 6 years ago

sabjoslo commented 6 years ago

Add support to:

sabjoslo commented 6 years ago

@BabakHemmatian c6cab8 should fix this. Now Parse_Rel_RC_Comments and Parser take arguments download_raw and clean_raw, which, if True, will (respectively) download the raw data and clean it up once it's finished parsing it. Important: The biggest data files (~7.4 GB) have an eta of about an hour and a half to download.

A couple other things:

I did all this on a separate branch. If it works for you/there's nothing else you think should be changed, I'll merge it with the master branch.