JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

tableslurp fails when keyspaces prefix one another #10

Closed tnine closed 12 years ago

tnine commented 12 years ago

I'm having an error when downloading my backups using tableslurp. I've used tablesnap to backup 2 directories.

/mnt/cassandra/data/Usergrid

and

/mnt/cassandra/data/Usergrid_Applications

When attempting to restore the 'Usergrid' Keyspace via this command.

python tableslurp -k -s -n usergrid-dev-2012-09-25 usergrid-dev-sstables /mnt/cassandra/data/Usergrid ~/Downloads/data/restore/Usergrid

I receive this stacktrace

Traceback (most recent call last): File "tableslurp", line 286, in sys.exit(main()) File "tableslurp", line 282, in main dh = DownloadHandler(args) File "tableslurp", line 87, in init (owner, group) = self._build_file_set(args.file) File "tableslurp", line 132, in _build_file_set self.fileset = json_data[self.origin] KeyError: '/mnt/cassandra/data/Usergrid'

Upon further inspection of the json_data object, I see that the fileset returned is not the one specified ('Usergrid') but rather the longer 'UsergridApplications'

Below is a snipping from the logging output of the json_data object

tableslurp [2012-09-26 13:28:33,065] INFO json data {u'/mnt/cassandra/data/Usergrid_Applications': [u'Entity_Id_Sets-hd-8-Digest.sha1', ...']}

As you can see, the key, is '/mnt/cassandra/data/Usergrid_Applications', not '/mnt/cassandra/data/Usergrid' as expected

tnine commented 12 years ago

Note that is is returned correctly in the keys object on line 119. It appears to be the sort/pop that's causing the issue

thekad commented 12 years ago

What if you use:

python tableslurp -k -s -n usergrid-dev-2012-09-25 usergrid-dev-sstables /mnt/cassandra/data/Usergrid/ ~/Downloads/data/restore/Usergrid

Does that work?

tnine commented 12 years ago

It filters the keys properly, but it causes a key lookup failure in the map. The map doesn't contain the last '/' char, and the key used to search it does.

tableslurp [2012-09-26 14:34:48,795] INFO json data {u'/mnt/cassandra/data/Usergrid': [u'Properties-hd-4-Index.db', u'Applications-hd-14-Filter.db', u'Tokens-hd-31-Filter.db', u'Applications-hd-14-Index.db', u'Properties-hd-4-Data.db', u'Tokens-hd-31-Digest.sha1', u'Applications-hd-14-Statistics.db', u'Applications-hd-14-Data.db', u'Properties-hd-4-Digest.sha1', u'Properties-hd-4-Statistics.db', u'Tokens-hd-31-Statistics.db', u'Applications-hd-14-Digest.sha1', u'Properties-hd-4-Filter.db', u'Tokens-hd-31-Index.db', u'Tokens-hd-31-Data.db']} Traceback (most recent call last): File "tableslurp", line 288, in sys.exit(main()) File "tableslurp", line 284, in main dh = DownloadHandler(args) File "tableslurp", line 87, in init (owner, group) = self._build_file_set(args.file) File "tableslurp", line 134, in _build_file_set self.fileset = json_data[self.origin] KeyError: '/mnt/cassandra/data/Usergrid/'

thekad commented 12 years ago

Yup. So the problem is on line ~108:

#       Otherwise try to fetch the most recent one
        else:
            keys = [_ for _ in bucket.get_all_keys(prefix=self.prefix) if\
                _.name.endswith('-listdir.json')]
            if keys:
                keys.sort(key=lambda l: parser.parse(l.last_modified))
                key = keys.pop()

As you can see, the prefix does not include the last /... in theory that would be the easiest change:

            keys = [_ for _ in bucket.get_all_keys(prefix='%s/' % (self.prefix,)) if\
                _.name.endswith('-listdir.json')]

Does that sound sane?

tnine commented 12 years ago

It does to me. Given that the path will always be a directory, I think a suffix of '/' should be appended to the directory the user passes as an arg unless it's present.