JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Tablesnap doesn't upload *-Summary.db files by default #42

Closed russss closed 7 years ago

russss commented 10 years ago

Cassandra 2.x adds another file per sstable ending in "-Summary.db" to the data directory. If these aren't restored I don't believe it's a fatal error - they can get rebuilt - but it adds overhead to startup because all the indexes have to be scanned.

This file isn't uploaded by tablesnap by default because it's written to the directory in-situ rather than being moved into it. This is easily fixable by setting the command line option to listen for IN_CLOSE_WRITE as well, so I wonder if it would be worth listening for that by default?

jwojcik-zz commented 8 years ago

Which files aren't necessary to do a restore? It seems that the Summary files aren't necessary - they can be recreated via the index file.

Would we also need the Statistics files? Which of the below are mandatory?

The types of files are: Data Index Filter Summary CompressionInfo Statistics

devmage commented 8 years ago

The file may or may not be necessary, however it is still reflected in the listdir.json that is created in tablesnap. That json as I understand is intended as a manifest of all the files that existed in the table directory at last update.

So, either the manifest is incorrect and should ignore Summary.db files, or the upload listener isn't functioning properly by default.