JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Make tableslurp able to restore recursively #86

Closed juiceblender closed 7 years ago

juiceblender commented 7 years ago

Hello!

This commit makes tableslurp able to restore recursively based on the origin provided. I've tried to make it backwards compatible so it keeps it's original behaviour in case there are others using it. I'm not good with Python and I don't know Pythonic ways of doing things so comments are welcome.

Basically, it introduces a --recursive flag, and for every found table it will create a DownloadHandler, let that run to completion, then move on to the next table. An example:

tableslurp -n 172.31.9.86 --recursive --aws-region ap-northeast-2 lerhtest /cassandra/data backuptest/
tableslurp [2017-10-17 02:04:20,356] INFO Building fileset
tableslurp [2017-10-17 02:04:20,481] INFO Found restoretest/ledger-c2e4a3a0b21911e789d4671ea19485f1 with 25 files
tableslurp [2017-10-17 02:04:20,516] INFO Found restoretest/perks-c375e720b21911e789d4671ea19485f1 with 25 files
tableslurp [2017-10-17 02:04:20,616] INFO Found system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca with 17 files
tableslurp [2017-10-17 02:04:20,654] INFO Found system/local-7ad54392bcdd35a684174e047860b377 with 17 files
tableslurp [2017-10-17 02:04:20,691] INFO Found system/paxos-b7b7f0c2fd0a34108c053ef614bb7c2d with 9 files
tableslurp [2017-10-17 02:04:20,740] INFO Found system/peers-37f71aca7dc2383ba70672528af04d4f with 35 files
tableslurp [2017-10-17 02:04:20,807] INFO Found system/size_estimates-618f817b005f3678b8a453f3930b8e86 with 17 files
tableslurp [2017-10-17 02:04:20,863] INFO Found system/sstable_activity-5a1ff267ace03f128563cfae6103c65e with 9 files
tableslurp [2017-10-17 02:04:20,907] INFO Found system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895 with 9 files
tableslurp [2017-10-17 02:04:20,953] INFO Found system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f with 25 files
tableslurp [2017-10-17 02:04:20,985] INFO Found system_schema/dropped_columns-5e7583b5f3f43af19a39b7e1d6f5f11f with 9 files
tableslurp [2017-10-17 02:04:21,029] INFO Found system_schema/functions-96489b7980be3e14a70166a0b9159450 with 9 files
tableslurp [2017-10-17 02:04:21,059] INFO Found system_schema/indexes-0feb57ac311f382fba6d9024d305702f with 9 files
tableslurp [2017-10-17 02:04:21,106] INFO Found system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6 with 25 files
tableslurp [2017-10-17 02:04:21,162] INFO Found system_schema/tables-afddfb9dbc1e30688056eed6c302ba09 with 25 files
tableslurp [2017-10-17 02:04:21,196] INFO Found system_schema/triggers-4df70b666b05325195a132b54005fd48 with 9 files
tableslurp [2017-10-17 02:04:21,238] INFO Found system_schema/types-5a8b1ca866023f77a0459273d308917a with 9 files
tableslurp [2017-10-17 02:04:21,280] INFO Found system_schema/views-9786ac1cdd583201a7cdad556410c985 with 9 files
tableslurp [2017-10-17 02:04:21,349] INFO Will now try to test writing to the target dir backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48
tableslurp [2017-10-17 02:04:21,349] INFO Will write to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48
tableslurp [2017-10-17 02:04:21,349] INFO Running
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-Digest.crc32 onto queue
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-Data.db onto queue
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-TOC.txt onto queue
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-Summary.db onto queue
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-Filter.db onto queue
tableslurp [2017-10-17 02:04:21,349] INFO Pushing file mc-5-big-Statistics.db onto queue
tableslurp [2017-10-17 02:04:21,350] INFO Pushing file mc-5-big-CompressionInfo.db onto queue
tableslurp [2017-10-17 02:04:21,350] INFO Pushing file mc-5-big-Index.db onto queue
tableslurp [2017-10-17 02:04:21,350] INFO Pushing file backups onto queue
tableslurp [2017-10-17 02:04:21,350] INFO Thread #0 processing items
tableslurp [2017-10-17 02:04:21,351] INFO Thread #1 processing items
tableslurp [2017-10-17 02:04:21,351] INFO Thread #2 processing items
tableslurp [2017-10-17 02:04:21,353] INFO Thread #3 processing items
tableslurp [2017-10-17 02:04:21,393] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Digest.crc32 from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Digest.crc32
tableslurp [2017-10-17 02:04:21,406] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-TOC.txt from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-TOC.txt
tableslurp [2017-10-17 02:04:21,418] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Summary.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Summary.db
tableslurp [2017-10-17 02:04:21,420] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Data.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Data.db
tableslurp [2017-10-17 02:04:21,456] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Filter.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Filter.db
tableslurp [2017-10-17 02:04:21,464] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Statistics.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Statistics.db
tableslurp [2017-10-17 02:04:21,468] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-CompressionInfo.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-CompressionInfo.db
tableslurp [2017-10-17 02:04:21,478] INFO Downloading 172.31.9.86:/cassandra/data/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Index.db from lerhconsultest to backuptest/system_schema/triggers-4df70b666b05325195a132b54005fd48/mc-5-big-Index.db
tableslurp [2017-10-17 02:04:21,485] INFO Thread #0 finished processing
tableslurp [2017-10-17 02:04:21,502] INFO Thread #3 finished processing
tableslurp [2017-10-17 02:04:21,503] INFO Thread #2 finished processing
tableslurp [2017-10-17 02:04:21,503] INFO Thread #1 finished processing
tableslurp [2017-10-17 02:04:21,503] INFO My job is done.
tableslurp [2017-10-17 02:04:21,552] INFO Will now try to test writing to the target dir backuptest/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6
tableslurp [2017-10-17 02:04:21,553] INFO Will write to backuptest/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6
tableslurp [2017-10-17 02:04:21,553] INFO Running
tableslurp [2017-10-17 02:04:21,553] INFO Pushing file mc-9-big-TOC.txt onto queue
tableslurp [2017-10-17 02:04:21,553] INFO Pushing file mc-10-big-TOC.txt onto queue
tableslurp [2017-10-17 02:04:21,553] INFO Pushing file mc-11-big-Data.db onto queue
tableslurp [2017-10-17 02:04:21,553] INFO Pushing file mc-10-big-Index.db onto queue
....and so on. 

The end result is:

ubuntu@ip-172-31-9-86:~$ ls backuptest/
restoretest/   system/        system_schema/
ubuntu@ip-172-31-9-86:~$ ls backuptest/restoretest/
ledger-c2e4a3a0b21911e789d4671ea19485f1/ perks-c375e720b21911e789d4671ea19485f1/
ubuntu@ip-172-31-9-86:~$ ls backuptest/restoretest/ledger-c2e4a3a0b21911e789d4671ea19485f1/mc-
mc-1-big-CompressionInfo.db  mc-1-big-Index.db            mc-2-big-CompressionInfo.db  mc-2-big-Index.db            mc-3-big-CompressionInfo.db  mc-3-big-Index.db
mc-1-big-Data.db             mc-1-big-Statistics.db       mc-2-big-Data.db             mc-2-big-Statistics.db       mc-3-big-Data.db             mc-3-big-Statistics.db
mc-1-big-Digest.crc32        mc-1-big-Summary.db          mc-2-big-Digest.crc32        mc-2-big-Summary.db          mc-3-big-Digest.crc32        mc-3-big-Summary.db
mc-1-big-Filter.db           mc-1-big-TOC.txt             mc-2-big-Filter.db           mc-2-big-TOC.txt             mc-3-big-Filter.db           mc-3-big-TOC.txt

The current way of doing it is still preserved:

ubuntu@ip-172-31-9-86:~$ tableslurp -n 172.31.9.86 --aws-region ap-northeast-2 lerhconsultest /cassandra/data/restoretest/perks-c375e720b21911e789d4671ea19485f1 backuptest/
tableslurp [2017-10-17 02:10:16,819] INFO Building fileset
tableslurp [2017-10-17 02:10:16,861] INFO Fileset contains 25 files to download
tableslurp [2017-10-17 02:10:16,914] INFO Will now try to test writing to the target dir backuptest/
tableslurp [2017-10-17 02:10:16,914] INFO Will write to backuptest/
tableslurp [2017-10-17 02:10:16,915] INFO Running
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-1-big-Summary.db onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-2-big-Data.db onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-3-big-Filter.db onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-2-big-Digest.crc32 onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-3-big-Summary.db onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-1-big-CompressionInfo.db onto queue
tableslurp [2017-10-17 02:10:16,915] INFO Pushing file mc-2-big-Index.db onto queue...
.....
tableslurp [2017-10-17 02:10:17,267] INFO Downloading 172.31.9.86:/cassandra/data/restoretest/perks-c375e720b21911e789d4671ea19485f1/mc-1-big-Index.db from lerhconsultest to backuptest/mc-1-big-Index.db
tableslurp [2017-10-17 02:10:17,278] INFO Thread #2 finished processing
tableslurp [2017-10-17 02:10:17,282] INFO Downloading 172.31.9.86:/cassandra/data/restoretest/perks-c375e720b21911e789d4671ea19485f1/mc-3-big-Data.db from lerhconsultest to backuptest/mc-3-big-Data.db
tableslurp [2017-10-17 02:10:17,288] INFO Downloading 172.31.9.86:/cassandra/data/restoretest/perks-c375e720b21911e789d4671ea19485f1/mc-3-big-TOC.txt from lerhconsultest to backuptest/mc-3-big-TOC.txt
tableslurp [2017-10-17 02:10:17,295] INFO Thread #3 finished processing
tableslurp [2017-10-17 02:10:17,305] INFO Thread #1 finished processing
tableslurp [2017-10-17 02:10:17,322] INFO Thread #0 finished processing
tableslurp [2017-10-17 02:10:17,322] INFO My job is done.

Giving:

backuptest/
mc-1-big-CompressionInfo.db  mc-1-big-Statistics.db       mc-2-big-Digest.crc32        mc-2-big-TOC.txt             mc-3-big-Index.db            
mc-1-big-Data.db             mc-1-big-Summary.db          mc-2-big-Filter.db           mc-3-big-CompressionInfo.db  mc-3-big-Statistics.db      
mc-1-big-Digest.crc32        mc-1-big-TOC.txt             mc-2-big-Index.db            mc-3-big-Data.db             mc-3-big-Summary.db
mc-1-big-Filter.db           mc-2-big-CompressionInfo.db  mc-2-big-Statistics.db       mc-3-big-Digest.crc32        mc-3-big-TOC.txt
mc-1-big-Index.db            mc-2-big-Data.db             mc-2-big-Summary.db          mc-3-big-Filter.db           

The current way of doing it is a little bit finicky as mentioned in https://github.com/JeremyGrosser/tablesnap/issues/57. I've also added a check to make --recursive and -file mutually exclusive.

Thanks!

juiceblender commented 7 years ago

I've retested and updated the PR based on your comments, let me know if there are any more changes you think would be good :)

rcolidba commented 7 years ago

This feature has been needed for a long time, thanks for contributing it! I also appreciate the approach of leaving the default behavior the same and enabling recursive via a simple flag.

juiceblender commented 7 years ago

Hi,

I've done a little bit of changes to also make it able to restore CLogs as referenced in https://github.com/JeremyGrosser/tablesnap/issues/88.

Hopefully it's much cleaner now. Also please let me know if you would like to split out the branches. I've tested these on my test EC2 instances, so should be good.