autolab / Tango

Standalone RESTful autograding service
http://www.autolabproject.com/
Apache License 2.0
48 stars 60 forks source link

[hotfix] Check element exists before removing for Python thread-safe queue #228

Closed 20wildmanj closed 1 year ago

20wildmanj commented 1 year ago

Description

Fixes issue found in Tango deployments that use the default Python thread-safe queue instead of TangoRemoteQueue, where it was possible for remove to be called to remove an element already removed from a queue, causing an unhandled exception.

image

This hotfix merely will check to see if the element exists in the queue before calling the remove method, which is consistent with TangoRemoteQueue, which does not raise an exception in the same scenario as well.

Testing

cg2v commented 1 year ago

While this will certainly prevent exceptions, is it actually the right fix? This looks like another variant of #207

20wildmanj commented 1 year ago

Yes we have looked at #207, though because my exposure to Tango has only been through hotfixes the past week I'm not 100% in terms of understanding the states of Tango. There's definitely a better solution, but making it happen right now may be difficult in terms of time.

cg2v commented 1 year ago

I'm also concerned because the exception I encountered was in assignJob, which only has one callsite. How was jobQueue.assignJob called on a job that was not in the unassigned queue?

michellexliu commented 1 year ago

The jobQueue.assignJob replaced job.makeAssigned() in __manage in #227 to fix a bug where the vm was never getting set, leading to issues in our autolab-docker installation.

cg2v commented 1 year ago

Upon further review of the code

20wildmanj commented 1 year ago

Thanks for looking into this, I've removed the removal from the unassigned queue in assignJob.

20wildmanj commented 1 year ago

Closing this PR for now, will return to this issue with a better fix later.