JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

Tablechop logging option, try/except in tablesnap #36

Closed ethanmiller closed 10 years ago

ethanmiller commented 11 years ago

1206e40 Just provides an option to log every file tablechop removes from S3, nice for the purposes of auditing it's work

06bca99 is one more try/except for a missing file. I logged all pyinotify events, and compared to tablesnap logs and saw this sequence:

In our case file B never made it to S3. At some point puppet would restart tablesnap, but during those gaps I was seeing a lot of missed backups. I believe that during the time tablesnap is consulting S3 for key_exists() compaction was removing the file causing the subsequent open() to fail.

Overall in a 12 node cluster I was seeing occurrences of the "Failed uploading X" error 5 - 10 times per day. With this try/except I haven't seen any so far, and am not missing any backups to date.

osabina commented 11 years ago

Believe I am being bitten by this same race condition, FYI.