Open grondo opened 9 months ago
I think that a submission flag would work as long as the drawbacks that you noted could be overcome. Generally we allow 'standby' jobs to be exempt from other queue limits and allow all users to access them. So, we would also want the preemptible flag could also be seen by the priority plugin so that it can not count those jobs against queue limits. I think that would provide the same benefits as the queue implementation, at least for how we use standby / preemption.
That said, there are a number of use cases that can be solved by overlapping queues (exempt / expedite, whole cluster DATs), so that could be considered a benefit of that approach. Exempt / expedite could probably all be done through accounting / the priority plugin. We should probably talk more about DATs where we want to be able to let a user run on all nodes on a cluster that we've split into multiple queues.
This idea was discussed again in a meeting recently. The preemptible flag still seems to be the solution of choice, but this will require an update to the resource acquisition protocol. I've opened flux-framework/rfc#423.
Over in the flux team on Teams, one of the users on Tuolumne had an interesting idea around standby / preemption, which would be to allow users to specify a minimum duration for their jobs:
However I got to thinking that a minimum time in addition to a maximum time could create a more powerful mechanism than standby. If you wanted a slurm like standby you would set your job's minimum time to 0, but if you wanted to actually get something done but also let other jobs in after you'd made some progress setting a minimum time of an hour or something might be a reasonable compromise.
Note that we added a preemptible-after
attribute to RFC 14 after discussion in flux-framework/rfc#423
During development, this can be set on a job with e.g. flux run --setattr=preemptible-after=0
.
Sounds good. I like having it as an attribute, and if we want it to be a flag, we could always offer a CLI flag that sets the attribute, or is there another meaning of flag I'm not processing?
Sounds good. I like having it as an attribute, and if we want it to be a flag, we could always offer a CLI flag that sets the attribute, or is there another meaning of flag I'm not processing?
The meaning of flag here is a job submission flag as defined in the submit or set-flags events. submit
flags are set via the cli submission --flags option and include debug
, waitable
, and novalidate
. I think we were originally thinking of adding preemptible
as one of these flags.
Unfortunately not documented anywhere, there are also a couple other flags that may be set by the job manager or jobtap plugins. These include the alloc-bypass
and immutable
flags. (Just adding those for completeness sake, we should get all these added to an RFC)
That helps @grondo, thanks! From a logical perspective I can see it fitting in with those. That said, if we want to expose those to fluxion I would think attributes might be a good way to do it. Maybe worth thinking about as a general sub-object or something.
On Nov 18, 2024, at 12:11 PM, Mark Grondona @.***> wrote:
Sounds good. I like having it as an attribute, and if we want it to be a flag, we could always offer a CLI flag that sets the attribute, or is there another meaning of flag I'm not processing?
The meaning of flag here is a job submission flag as defined in the submithttps://urldefense.us/v3/__https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_21.html*submit-event__;Iw!!G2kpM7uM-TzIFchu!2PxQbkwb6Ojef-n4cQgR_Dw8mDq3F7QqBoATxq1vyehr1Cg3KY55ZHy-feIWmuqh8Jm81BstGS9BXNq9FLPxDStD9Tw$ or set-flagshttps://urldefense.us/v3/__https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_21.html*set-flags-event__;Iw!!G2kpM7uM-TzIFchu!2PxQbkwb6Ojef-n4cQgR_Dw8mDq3F7QqBoATxq1vyehr1Cg3KY55ZHy-feIWmuqh8Jm81BstGS9BXNq9FLPxTXe-Ap4$ events. submit flags are set via the cli submission --flagshttps://urldefense.us/v3/__https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man1/flux-submit.html*cmdoption-flux-submit-flags__;Iw!!G2kpM7uM-TzIFchu!2PxQbkwb6Ojef-n4cQgR_Dw8mDq3F7QqBoATxq1vyehr1Cg3KY55ZHy-feIWmuqh8Jm81BstGS9BXNq9FLPxKKk2WGA$ option and include debug, waitable, and novalidate. I think we were originally thinking of adding preemptible as one of these flags.
Unfortunately not documented anywhere, there are also a couple other flags that may be set by the job manager or jobtap plugins. These include the alloc-bypass and immutable flags. (Just adding those for completeness sake, we should get all these added to an RFC)
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/5739*issuecomment-2483631560__;Iw!!G2kpM7uM-TzIFchu!2PxQbkwb6Ojef-n4cQgR_Dw8mDq3F7QqBoATxq1vyehr1Cg3KY55ZHy-feIWmuqh8Jm81BstGS9BXNq9FLPx5QGVPHs$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AAFBFNKRKALDQRVIOJXP7TD2BINUPAVCNFSM6AAAAABDIVZBRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBTGYZTCNJWGA__;!!G2kpM7uM-TzIFchu!2PxQbkwb6Ojef-n4cQgR_Dw8mDq3F7QqBoATxq1vyehr1Cg3KY55ZHy-feIWmuqh8Jm81BstGS9BXNq9FLPxShJck7A$. You are receiving this because you commented.Message ID: @.***>
From @ryanday36's list in #5165:
In some offline discussion, it was proposed that we could add a
preemptible
(or similar) job submission flag for this purpose. Drawbacks to this approach:flux jobs
outputMost of those can be easily overcome if a submission flag is the correct approach.
Alternate solutions include: