flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

job-manager: epilog/prolog timeout #5398

Closed grondo closed 1 year ago

grondo commented 1 year ago

@ryanday36 had asked if there was an epilog timeout that could be enforced by Flux.

Currently, there is no way to enforce a timeout for either the prolog or epilog. There a couple ways support could be added.

  1. An option could be added to the flux perilog-run command which would add a time-limit to execution and would drain ranks that timed out. This option could then be optionally passed in via the job-manager.prolog.command or job-manager.epilog.command configuration.
  2. A timelimit option could be added to the job-manager.prolog and job-manager.epilog. If present, then perilog.so plugin would enforce the time limit. This would work for other prolog/epilog commands instead of just flux perilog-run, but if the command timed out, the plugin may not know which broker ranks were still in progress, so all rank may have to be drained.

I also just thought of a 3rd option which is nice in its simplicity:

grondo commented 1 year ago

Closed by #5416