JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Only upload index files once per SSTable #53

Closed russss closed 9 years ago

russss commented 9 years ago

Cassandra SSTables consist of six files on disk. When a new one is created, tablesnap uploads six index files as well. This is fairly pointless and increases the number of files that tablechop has to fetch and parse significantly.

On one of our machines which has been running tablesnap for a week tablechop is currently taking >12 hours to run - the majority of the time taken is fetching index files (we use LeveledCompactionStrategy which probably exacerbates this):

tablechop [2015-08-03 15:03:18,270] INFO 79397 keys total
tablechop [2015-08-03 15:03:18,270] DEBUG 39702 json listdir keys

This patch only generates a new index file for each Data.db file.

JeremyGrosser commented 9 years ago

Could you make this an option? I think we originally decided to backup the indexes because they can take a long time to rebuild in some scenarios.

russss commented 9 years ago

I'm not referring to Cassandra's Index.db files - by "index" in this case I mean tablesnap's .json files, which are currently uploaded six times per sstable.