flightstats / hub

fault tolerant, highly available service for data storage and distribution
http://www.flightstats.com
MIT License
103 stars 35 forks source link

Fix up distributed lock runner test such that it is a bit more reliable #1182

Closed lkemmerer closed 5 years ago

lkemmerer commented 5 years ago

These tests were being cranky again, so I did some rework around what we're asserting on. Instead of a simple list of locks, I'm using a hashmap with rough timestamps of when a lock was acquired and released. That way I can do things like ensure that two locks were held simultaneously or that they happened in one at a time. It also removes assertions on ordering where that's not the primary thing being tested. It makes the code a little more complicated, so let me know if there are readability improvements. Thanks to our JUnit5 upgrade, I was able to use @RepeatedTest(1000) to see how stable the new tests were. I got 100% passing tests where one of them had been failing about 40% of the time before. I'm leaving extensive logging in there just in case.

lkemmerer commented 5 years ago

@dkleckler Most of the failures happened in the test that makes sure that different locks can happen at the same time. Getting the timing right such that:

  1. Lock 1 started
  2. Lock 2 started and finished
  3. Lock 1 finished
  4. Lock 3 started and finished

was extremely finicky for reasons that Manjula and I couldn't figure out. This relies less on ordering for that and more on making sure there's an overlap or not an overlap in whatever order things end up running... And since I made that change for one test and the assertions were more explicit, I changed all the tests.