There was a leftover thing where we had files with uppercase names, but the data generator was using lowercase names. This would create files in the git index on machines with case-insensitive filesystems that it was very difficult to remove. Hopefully this fixes it.
Also added tqdm for progress bars, and a slight performance improvement in reddit data generation.
Putting this in its own PR because it appears to change a lot of files (which are mostly dups)
There was a leftover thing where we had files with uppercase names, but the data generator was using lowercase names. This would create files in the git index on machines with case-insensitive filesystems that it was very difficult to remove. Hopefully this fixes it.
Also added tqdm for progress bars, and a slight performance improvement in reddit data generation.
Putting this in its own PR because it appears to change a lot of files (which are mostly dups)