Closed trws closed 4 weeks ago
Sure, I can add that in no problem. I have a test that should do that, but I haven't managed to make that test fail even with the old code. This ensures there's no code path where it could get from sched.cancel
to sched-fluxion-resource.cancel
, but I haven't found the race or whatever it is that makes that actually occur. Either way I think this is a test we want, adding shortly.
Test pushed.
Anyone want a last look here before MWP? I think the added test tickles the issue but another set of eyes would be great.
@trws I have a nit on the new commit message, but apart from that this is good to go.
problem: The "remove" method was used for both handling the
free
andcancel
RPCs. Under some code paths, remove de-allocates resources, so using it for a pending cancellation, even if it seems to be protected by checks, is unnecessarily dangerous. Given the recent double-booking issue and a lack so far of other theories on how we get there, this is a possible culprit.solution: Re-factor queue_policy_base_t remove into remove and remove_pending, the latter of which only operates on the pending states and has no code-paths that can result in a resource.cancel RPC being sent. The remove_pending is used for both the cancel RPC and the part of the free RPC where the code was used before, should be a pure refactor.