Closed fanpu closed 3 years ago
While I'm sure this will prevent the exception, it seems like you're misunderstanding your state machine. jobQueue.makeDead can be called from 4 places:
In the first 3 cases, the job cannot be an unassigned job, because a) that's where it comes from in manage, and b) worker can only get a job that manage gave it.
delJob has no callers (I assume there's some idea of having a more complete REST interface, but it's not hooked up except in tests). IMO, The correct thing there would be to reject deleting an assigned job, and not calling makeDead (e.g. inlining the parts of makeDead that make sense for the delJob case)
Thanks for your insights! I'll admit that I am still far from familiar with the Tango codebase, and am still learning more about it as I go.
IMO, The correct thing there would be to reject deleting an assigned job, and not calling makeDead (e.g. inlining the parts of makeDead that make sense for the delJob case)
If a job was assigned, wouldn't we still need to remove it from the liveJobs dictionary? I might be misunderstanding something here, but I thought that it will still be necessary to do clean-up in the event that an assigned job encounters an exception.
Addresses https://github.com/autolab/Tango/pull/196#issuecomment-789041313
There is a bug in jobQueue that causes it to raise an exception when attempting to remove a live job from the
unassignedJob
queue when it has already been removed previously/This was because there was a wrong assumption that if a job is live, it must still be in the unassigned queue.
A live job can either be:
The makeDead function is supposed to shift any live job to the dead job dictionary. So on line 312 we wrongly assumed that a live job has to be in the unassigned queue (it could be running) and hence will not be found in the unassigned queue.
This PR catches the ValueError that .remove may raise, and logs a message.