Closed sabjoslo closed 6 years ago
@BabakHemmatian c6cab8 should fix this. Now Parse_Rel_RC_Comments
and Parser
take arguments download_raw
and clean_raw
, which, if True
, will (respectively) download the raw data and clean it up once it's finished parsing it. Important: The biggest data files (~7.4 GB) have an eta of about an hour and a half to download.
A couple other things:
filename
argument to BZ2File
to path+filename
. Right now, it seems like it assumes that you're working in the PWD, and I don't know if that was intentional on your part.Parse_Rel_RC_Comments
explaining the new arguments. Feel free to change anything I got wrong there.I did all this on a separate branch. If it works for you/there's nothing else you think should be changed, I'll merge it with the master branch.
Add support to: