DaylightingSociety / SocMap

Social Mapping Framework for Twitter
https://socmap.daylightingsociety.org/
BSD 3-Clause "New" or "Revised" License
18 stars 4 forks source link

Add Tweet Compression #5

Closed milo-trujillo closed 6 years ago

milo-trujillo commented 6 years ago

Tweets are huge json blobs with lots of repetition that compress quite nicely with gzip. There's already a command-line argument for enabling gzip, so compression is included as a boolean in the options dictionary.

If compression is enabled we should add ".gz" to the end of all tweet filenames, and compress them before saving the jzon blobs to disk. Similarly, when reading tweets we should expect a ".gz" at the end, and decompress them before loading the json blobs.

milo-trujillo commented 6 years ago

Complete and pushed to master