Closed jolui206 closed 1 year ago
Existing input file was read, if this is not correct you should delete the pushshift_working folder and run this script again
This is the important line here. The script saves progress in between runs so if it crashes you can start back up in the middle. That line is telling you that it's loading the saved progress file, which might be from an old run and the files it's trying to pick up aren't there any more.
The actual error is just saying there were no files so it failed when trying to calculate the progress.
However, if you just want the subreddit wallstreetbets, you can get it from here and save yourself the trouble https://www.reddit.com/r/pushshift/comments/11ef9if/separate_dump_files_for_the_top_20k_subreddits/
hi, this is the error message I get when I try running the script combine_folder_multiprocess.py. both the script and the data files (e.g. RC_2020_01.zst) is also in same folder. Is there something that I have not configured correctly, it doesn't seem to be loading the compressed files? thanks