Open bema-aei opened 3 months ago
Note, you want to use permalinks when referring to code, so the link stays valid over time (i.e. in context with the issue). See the ellipsis in the upper-right corner of the file view.
What race condition is this comment referring to
Does the actual commit help? Maybe even try and tap into Bruce's memory?
Thanks it does.
The interesting thing here is that the commit actually doesn't change the
wu.assimilate_state = ASSIMILATE_READY;
line, it is just left untouched. This means the code before that commit did both, set assimilate_state to trigger assimilation and trigger immediate transition. This will surely create a race condition.
Bruce was correct to avoid this, but I think he fixed it at the wrong end. IMHO immediate transition should be triggered here, and not assimilation directly.
Occasionally Einstein@Home encounters the following situation:
Also, when setting assimilate_state, the validator doesn't care about results of the workunit that are still "in progress". Triggering assimilation (and subsequently file deletion) of a workunit immediately after finding a canonical result means that these "late" results can not be (successfully) validated, assimilated and credited even if these are valid and arrive on time, i.e. before their deadline.
I was about to modify the validator to not set assimilate_state directly, but just let the transitioner know to take a look at that workunit (immediately). It does handle such late results correctly and, and the transitioner doesn't have such a delay between reading the workuits and writing modifications as the validator has (at least occasionally).
But then I came across that comment in the validator:
https://github.com/BOINC/boinc/blob/master/sched/validator.cpp#L677-L679
What race condition is this comment referring to, and how would we resolve the problems above then?