flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
89 stars 41 forks source link

qmanager: reconsider blocked jobs on reprioritize #1217

Closed trws closed 5 months ago

trws commented 5 months ago

problem: if a job in reserved state, still pending in qmanager, gets reprioritized it's possible that it could unblock a job in the blocked queue and without doing a reconsideration it's possible we could get a priority inversion

solution: reconsider blocked jobs when a pending job is reprioritized

I have yet to make a reproducer that reliably needs this, but I'm pretty sure if we can do this sequence it will happen:

  1. start a long-running job that uses a node
  2. enqueue a high priority job that requires one of the resources used by the first so that it's reserved
  3. enqueue a lower priority job that could run immediately, but wont because it's blocked by the reservation
  4. reprioritize the reserved job to have a lower priority than the third
trws commented 5 months ago

Ok, finally figured out the test. Turns out the test I was thinking of does work, but only with conservative backfill. May need to look into why it doesn't happen with easy, but this did catch the issue and is fixed by the patch. Also new comments with the reconsider calls.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 74.0%. Comparing base (326da3f) to head (4dc7c17). Report is 171 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1217 +/- ## ======================================== - Coverage 74.0% 74.0% -0.1% ======================================== Files 103 103 Lines 14610 14612 +2 ======================================== Hits 10820 10820 - Misses 3790 3792 +2 ``` | [Files with missing lines](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1217?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework) | Coverage Δ | | |---|---|---| | [qmanager/policies/base/queue\_policy\_base.hpp](https://app.codecov.io/gh/flux-framework/flux-sched/pull/1217?src=pr&el=tree&filepath=qmanager%2Fpolicies%2Fbase%2Fqueue_policy_base.hpp&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=flux-framework#diff-cW1hbmFnZXIvcG9saWNpZXMvYmFzZS9xdWV1ZV9wb2xpY3lfYmFzZS5ocHA=) | `74.2% <100.0%> (-0.4%)` | :arrow_down: |