getodk / central

ODK Central is a server that is easy to use, very fast, and stuffed with features that make data collection easier. Contribute and make the world a better place! ✨🗄✨
https://docs.getodk.org/central-intro/
Apache License 2.0
127 stars 155 forks source link

Entity submission held in backlog when it should have been processed #705

Closed ktuite closed 1 month ago

ktuite commented 2 months ago

QA noticed this scenario with an offline create + update + update, where the 3rd update wasn't applied. Even though the submissions came in in the right order, I found the 3rd submission in the backlog:

Entity https://staging.getodk.cloud/#/projects/101/entity-lists/movies/entities/7f362b7f-3780-44f0-bbb3-97d9770abcdf

Submission 1: (applied) row 18 in other form - 5:08:09 subId 3618 https://staging.getodk.cloud/v1/projects/101/forms/movie%20entities/submissions/uuid%3Aed096cd0-202c-431a-8cea-624a8ceb19b0.xml <entity dataset="movies" id="7f362b7f-3780-44f0-bbb3-97d9770abcdf" create="1" trunkVersion="" branchId="" baseVersion="">

Submission 2: (applied) row 35 - 5:08:09 - subId 3620 https://staging.getodk.cloud/v1/projects/101/forms/movies_update/submissions/uuid%3Abb8bbdb8-2f4d-4b30-9c3b-3a62c4e93a2f.xml <entity dataset="movies" id="7f362b7f-3780-44f0-bbb3-97d9770abcdf" update="1" baseVersion="1" trunkVersion="" branchId="10138667-8a81-4721-aad5-f3316de1bc26">

Submission 3: (not applied) row 37 - 5:08:10 - subId 3622 https://staging.getodk.cloud/v1/projects/101/forms/movies_update/submissions/uuid%3Ae0b9644e-0905-4ea0-98e9-0388b5a4ecbb.xml <entity dataset="movies" id="7f362b7f-3780-44f0-bbb3-97d9770abcdf" update="1" baseVersion="2" trunkVersion="" branchId="10138667-8a81-4721-aad5-f3316de1bc26">

Submission 3 is in the backlog. It likely got processed after the other two submissions, and it should have been able to find the right base version, so why did it get held?

ktuite commented 2 months ago

Looking at the timestamps on the audit log, I think two update submissions were processed by different worker threads at the same time. The first update started, and then the second update started, but the first hadn't finished so the second couldn't find the appropriate base version yet.

I'm studying Sadiq's earlier PR about locking entities for update to improve how/where/when I'm locking entities for update (and how to write some tests for this). https://github.com/getodk/central-backend/pull/1033

matthew-white commented 2 months ago

QA noticed this scenario with an offline create + update + update, where the 3rd update wasn't applied. Even though the submissions came in in the right order, I found the 3rd submission in the backlog

I'm able to reproduce this scenario in testing. I'm also able to reproduce another, related scenario. If two workers concurrently process the first two submissions (the offline create + the first offline update), then the second submission can be sent to the backlog, where it will not be processed until it is force-processed. My first attempt at a solution addressed the scenario described above, but not the related scenario. However, my PR getodk/central-backend#1195 should address both scenarios.

I'm going to go ahead and merge that PR. I'm still working on writing a test that doesn't feel like its own race condition, but I should be able to finish that tomorrow.

matthew-white commented 1 month ago

I'm still working on writing a test

There's now a test here: getodk/central-backend#1202. I'm going to go ahead and close the issue.