Closed matthew-white closed 1 month ago
Something that may come in handy is to use pg_sleep()
to enlarge time windows, making it easier to set up concurrent operations to hit the race condition.
Ooh interesting! Are there any special strategies for inserting the pg_sleep()
where it needs to go? If a function runs a series of queries, and you want the pg_sleep()
to happen in the middle, is there an easy way to do that?
You can alter the function to have it run a select pg_sleep(60);
query in the middle if that's what you mean?
If you can't touch the function or its SQL, and the SQL mutates (insert/update/delete), then another angle would be to implant a trigger that does the pg_sleep()
before or after.
This PR adds a test for getodk/central#705. It runs the race condition described in the issue many times, asserting that it is successful each time.
Why is this the best possible solution? Were any other approaches considered?
The test in #1033 takes a more explanatory approach, beginning transactions manually and controlling the timing of queries. I think that test is helpful, but it didn't seem useful to just copy it for this new issue. It's also not easy to control the timing of
Entities._processSubmissionEvent()
, which I think would be needed in order to write such a test.The test that I wrote runs the race condition described in the issue. Even without locking, things occasionally work fine: the offline update doesn't end up stuck in the backlog. That's the case <10% of the time on my local machine. Given that things sometimes work, I run the race condition 50 times. I think it's nice to have a test that demonstrates the race condition as described in the issue and that fails if locking is removed.
I tried to do more to control the timing of
Entities._processSubmissionEvent()
, thinking that that would allow me to run the race condition only once. Specifically, this is what I had in mind:Entities._computeBaseVersion()
is run.However, it turned out to be harder than expected to spy on
Entities._computeBaseVersion()
. Each time a new container is created, it looks like the query modules are recreated (injector()
in lib/model/container.js). Further, a new container is created when a parent transaction is begun. I was able to spy onEntities._computeBaseVersion()
in the container supplied to the test, but that didn't end up being the sameEntities._computeBaseVersion()
run byEntities._processSubmissionEvent()
, as processing an event begins a new transaction.Before submitting this PR, please make sure you have:
make test
and confirmed all checks still pass OR confirm CircleCI build passes