f5devcentral / f5go

The F5 Go Redirector
MIT License
19 stars 8 forks source link

The serialized data can easily be corrupted if the program is abnormally terminated #4

Closed cwbooth5 closed 8 years ago

cwbooth5 commented 8 years ago

During some testing of resiliency to power losses on the system, I noticed that the godb.pickle serialized data can get clobbered pretty bad and in very, very strange ways. In this particular failure, a 'pkill' on the process ID was all it took. In about half of my tries, I could corrupt the data. Then it wouldn't start back up and I had lost lots of edits and new lists which were added while it was up. This leads me to think about two possible improvements.

  1. Go to a proper database, with atomic operations and cleaner startup/shutdown.
  2. Get a proper database backup interval in place, whatever data source we end up using.

This is something to think about, for sure. As more and more people use this, while it works for most of the cases, it isn't as fault-tolerant as we'd like.

wwsean08 commented 8 years ago

I was thinking about this problem and another possible solution would be to use Redis which has persistence but would be lighter than a full database given that lists are stored and you can't (directly) store lists in sqllite. The data would be stored by a different process and by default would by sync'd to disk every 2 seconds which would mean minimal data loss, plus it would be managed by another process so even if we killed the Go service there shouldn't be data loss.

Edit: Mostly just writing down my thoughts

cwbooth5 commented 8 years ago

I forked the repo and took a shot at redis and got it 90% there but my schema isn't bullet proof. I might throw my changes up on my repo and we can try them out. I've only worked with redis in the past so that was my go-to. It wasn't a total slam dunk though, because the way these classes fill up in the redirector is not quite something you can convey in the data structures redis provides. I used a lot of ZSETs and hashes.

wwsean08 commented 8 years ago

For now (as a temporary workaround) before getting it into redis or a DB I'm going to rewrite the serialization so that it writes to a temporary file and then copies that into place over the original file, this should greatly reduce the corruption risk.

saulpw commented 8 years ago

I suggest closing this bug (due to successful workaround) and opening up a different issue/task for choosing the right solution to store the data in a proper ACID database.

wwsean08 commented 8 years ago

Fair enough, that makes sense. Closed.