JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Improve speed of tablechop #54

Closed russss closed 7 years ago

russss commented 9 years ago

This parallelises tablechop's fetching of the .json files to reduce the amount of time it takes to run. It adds a dependency on eventlet.

This also removes the keep_existing option in favour of a different approach - it wasn't really clear to me what the intention of keep_existing was:

Instead of keeping all files which exist on disk, plus all json files which reference it, this update will always keep the last backup.

russss commented 7 years ago

@JeremyGrosser I've rebased this on top of the current master - would you be able to take a look at it?

We've been running this code successfully for a few years now, and it definitely adds quite a significant speedup to tablechop when you've got tens of thousands of sstables. Now I'm familiar with the tablesnap code I'm tempted to make a few more PRs, but it would be cool to see this one merged first.