JeremyGrosser / tablesnap

Uses inotify to monitor Cassandra SSTables and upload them to S3
BSD 2-Clause "Simplified" License
181 stars 86 forks source link

How computing MD5 at start could be harmful #34

Closed thekad closed 11 years ago

thekad commented 11 years ago

Picture the following scenario: you typically run tablesnap under supervisor. S3 fails (it has happened) or otherwise dies for some reason. all your tablesnaps fail at more or less the same time, then supervisor starts all your tablesnaps at the same time

The end result would be increased i/o util in all nodes of the cluster at the same time, which is not good.

Now, I'm not saying is always a bad idea to compute md5 at start, if you're doing it manually you can always do the initial load computing md5 and then restart tablesnap without --md5-on-start (or configure it in supervisor that way)

A couple more changed may have sneaked in, they are mostly for compatibility with more packaging forms, tested and made sure they are working properly