Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 533 forks source link

Allow multi-buffer payload decryption #620

Closed smukil closed 6 years ago

smukil commented 6 years ago

In many cases, queries or responses that spanned multiple mbufs would not get decrypted correctly causing cross-DC traffic to be dropped silently which resulted in a failure to replicate data in different DCs.

After a lot of debugging, this was found to be due to a typo introduced in a previous commit: https://github.com/Netflix/dynomite/commit/91e8fe38ea66b79b5241f02169fa04e2ad9a50ea

This typo is now fixed and multi-buffer decryption works seamlessly.

shailesh33 commented 6 years ago

I am sorry! :(

I read the comment in the code and it has indeed complicated cases. How did you debug this? Maybe a log debug around such cases help where we silently drop replication messages?

smukil commented 6 years ago

I am sorry! :(

I read the comment in the code and it has indeed complicated cases. How did you debug this? Maybe a log debug around such cases help where we silently drop replication messages?

No problem at all, typos are bound to happen sometime or the other. I think the best way to catch any form of human error is to add more automated testing. I will add a test case in an upcoming patch to test multi-buffer payloads that are communicated across DCs. As we improve our automated testing suite, bugs like this can hopefully be caught before merging them itself.

To debug this, I added quite a few changes to our test framework to try and repro this in a controlled setting, and then I used GDB to figure out what was going on.