flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

flux jobs: show urgency=0 (HOLD) jobs as "held" in the INFO colum #6426

Closed garlick closed 1 week ago

garlick commented 1 week ago

Problem: a user mistakenly set urgency=0 (HOLD) and then we all wondered why jobs weren't starting. Derp.

@ryanday36 suggested we show "held" in the INFO column for these jobs to make this more obvious.

grondo commented 1 week ago

Should this be the result only for urgency=0, or also priority=0? Most of the time one implies the other, but there could be cases where a a priority plugin might set priority=0 without urgency=0, so should that also indicate "held"?

garlick commented 1 week ago

That makes sense. The actual check that holds back a job from the scheduler checks the priority not the urgency.

ryanday36 commented 1 week ago

We should probably give some INFO for priority=0, but maybe something slightly different like PRIORITYHOLD. Since flux job urgency already uses the hold alias it's natural to assume that a job in a HOLD state can be released with flux job urgency ... default, and I assume that wouldn't necessarily be the case for a job with priority=0.

grondo commented 1 week ago

Ah, great point as usual @ryanday36!

garlick commented 1 week ago

So PRIORITYHOLD is priority=0 urgency!=0?

grondo commented 1 week ago

Yeah (though I was thinking priority-hold since we don't use all upcase for anything else in this field):

diff --git a/src/bindings/python/flux/job/info.py b/src/bindings/python/flux/job
/info.py
index 376f965e3..6c5421869 100644
--- a/src/bindings/python/flux/job/info.py
+++ b/src/bindings/python/flux/job/info.py
@@ -521,6 +522,12 @@ class JobInfo:
                     eta = flux.util.fsd(eta)
                 return f"eta:{eta}"
             except TypeError:
+                # No eta available. Print "held" if job is held, or
+                # priority-hold if priority == 0, otherwise nothing.
+                if self.urgency == 0:
+                    return "held"
+                elif self.priority == 0:
+                    return "priority-hold"
                 return ""
         else:
             return self.nodelist
garlick commented 1 week ago

Oh, just realized priority-hold is probably what happens when your bank is empty or you're not in the system.

grondo commented 1 week ago

I think jobs are now rejected when a user is not in the system, but yeah you're probably right! Does that mean there's potentially a better term?

I'll have a PR for this ready shortly. The change was simple, but testing turned out to be a little tricky.

garlick commented 1 week ago

On terminology, no great ideas here. Maybe as a separate, future effort, we could consider if the priority plugin should have a way of adding a memo like the scheduler does, and then the message could be made less generic?