JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Tableslurp performance expectation #94

Open pdehlke opened 6 years ago

pdehlke commented 6 years ago

I regularly tell people, after 25 years in data centers, that nobody cares about backups. The only thing people actually care about is restores.

To that end I have been testing tablesnap with a small keyspace. The current test dataset is ~28 GB and has a fairly decent rate of churn; I have been running tablesnap for two weeks and have tablechop set to prune the data at 7 days. The S3 bucket currently holds about 145 GB.

I invoked tableslurp like so:

tableslurp -k <Key> -s <secret> --aws-region us-west-2 -r -n ip-10-14-193-47 my-cassandra-backups-us-west-2 /data/cassandra/data/test_keyspace ./cassandra/data/test_keyspace

After letting that run for 6 hours, tableslurp had created directories for 5 of my 179 tables and had not yet downloaded any files. I killed that process, and restarted with -t 50, to give me 50 threads, and went to bed.

12 hours later, tableslurp had created directories for 33 of the 179 tables, and still had downloaded exactly zero files.

Is this expected behavior? If not, what have I got wrong?