JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

tablesnap crashes with md5 mismatch error from S3 #91

Closed mzbyszynski closed 6 years ago

mzbyszynski commented 7 years ago

Hi There!

I have tablesnap running on ubuntu with Cassandra 3.7, and it is crashing frequently with the below error (not always on the same file).

Is this normal?

Thank you for your help!

Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: Traceback (most recent call last):
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/bin/tablesnap", line 133, in worker
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     self.upload_sstable(bucket, keyname, f)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/bin/tablesnap", line 394, in upload_sstable
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     encrypt_key=self.with_sse)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 1374, in set_contents_from_filename
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     encrypt_key=encrypt_key)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 1305, in set_contents_from_file
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     chunked_transfer=chunked_transfer, size=size)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 762, in send_file
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     chunked_transfer=chunked_transfer, size=size)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 963, in _send_file_internal
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     query_args=query_args
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/connection.py", line 671, in make_request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     retry_handler=retry_handler
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/connection.py", line 1071, in make_request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     retry_handler=retry_handler)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/connection.py", line 940, in _mexe
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     request.body, request.headers)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:   File "/home/ubuntu/.local/lib/python2.7/site-packages/boto/s3/key.py", line 896, in sender
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]:     response.status, response.reason, body)
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: S3ResponseError: S3ResponseError: 400 Bad Request
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: <?xml version="1.0" encoding="UTF-8"?>
Nov 16 19:18:08 ip-10-0-0-29 tablesnap[14931]: <Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.<
tightly-clutched commented 7 years ago

Seeing the same issue on DSE 4.8.10 (cassandra 2.1.15) and tablesnap 1.2.1

2017-11-15 01:34:07,096 CRITICAL Failed uploading /data/lib/cassandra/data/dev/locking/dev-locking-tmp-ka-383139-Data.db. Aborting.
Traceback (most recent call last):
  File "/usr/local/bin/tablesnap", line 133, in worker
    self.upload_sstable(bucket, keyname, f)
  File "/usr/local/bin/tablesnap", line 394, in upload_sstable
    encrypt_key=self.with_sse)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1362, in set_contents_from_filename
    encrypt_key=encrypt_key)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1293, in set_contents_from_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 750, in send_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 951, in _send_file_internal
    query_args=query_args
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 668, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 940, in _mexe
    request.body, request.headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 884, in sender
    response.status, response.reason, body)
S3ResponseError: S3ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>BadDigest</Code><Message>The Content-MD5 you specified did not match what we received.</Message><ExpectedDigest>b762d0312863a16818d256ce3ad44dda</ExpectedDigest><CalculatedDigest>a8tB0+rMlQyxcHJLIbcP6A==</CalculatedDigest><RequestId>29890293DF4E2B84</RequestId><HostId>/rF6FmY2Q8biPxkIYFxbIw6hQagwwSQ5UWbqi8J1Eat33DBQ6Qell4QdLcIjmX192SbaFzuAex8=</HostId></Error>

Killed

By the way, thanks for a great addition to the cassandra ecosystem!

mzbyszynski commented 6 years ago

@tightly-clutched I dug into this some more and it seems like this is being caused by the pyinotify library that tablesnap uses to identify when to copy files to S3. When you start tablesnap with the --auto-add argument, it passes that to pyinotify, and because of some side effect in pyinotify, that automatically subscribes the watcher to the IN_CREATE, event, which makes tablesnap start copying the files to S3 as soon as they are created.

So I think the fix is tablesnap needs to ignore IN_CREATE events unless you are specifically trying to watch for them (via the --listen-events argument). I'm going to try to submit a PR for this :man_shrugging: