The priority plugin does not implement a max nodes limit to be enforced on an association's set of currently running jobs.
This PR is built on top of #442 and looks to add basic support for enforcing a max nodes limit on an association's set of running jobs. The method for implementing this limit follows the same template as the max running jobs limit; when a job is in job.state.depend, the nodes the submitted job is looking to use is extracted from jobspec and added to the association's current node count. If a job does not specify any nodes, the plugin just assumes a node count of 1 for the job. If the job would put the association over their limit, a dependency is added to the job and it is held until a currently running job finishes and cleans up.
To do this, I've proposed combining the two limits (max running jobs & max nodes) into one named dependency. So, if an association hits either their max running jobs or max node limit(s), the same dependency is added.
In the callback for job.state.inactive, the logic for releasing a held job due to a flux-accounting limit is slightly reworked. If an association is under their max running jobs limit, the first held job is grabbed and its node count is inspected, similar to how it is checked in job.state.depend. If releasing this held job would keep the association under or equal to their max nodes limit, the dependency is removed and the job can move on to being run. If the limit cannot be satisfied, the dependency is not removed and no held jobs are released until another one of the association's currently running jobs finishes and cleans up.
A couple of basic tests are added to 1034-mf-priority-max-nodes.t to simulate submitting jobs that take up all of the association's node limit and having a job held due to their max nodes limit. Once the currently running job finishes, a test checks that the held job transitions to run.
TODO
[ ] I should probably add some tests that simulate submitting some jobs that don't specify any nodes to make sure it works as expected
Background
The priority plugin does not implement a max nodes limit to be enforced on an association's set of currently running jobs.
This PR is built on top of #442 and looks to add basic support for enforcing a max nodes limit on an association's set of running jobs. The method for implementing this limit follows the same template as the max running jobs limit; when a job is in
job.state.depend
, the nodes the submitted job is looking to use is extracted from jobspec and added to the association's current node count. If a job does not specify any nodes, the plugin just assumes a node count of 1 for the job. If the job would put the association over their limit, a dependency is added to the job and it is held until a currently running job finishes and cleans up.To do this, I've proposed combining the two limits (max running jobs & max nodes) into one named dependency. So, if an association hits either their max running jobs or max node limit(s), the same dependency is added.
In the callback for
job.state.inactive
, the logic for releasing a held job due to a flux-accounting limit is slightly reworked. If an association is under their max running jobs limit, the first held job is grabbed and its node count is inspected, similar to how it is checked injob.state.depend
. If releasing this held job would keep the association under or equal to their max nodes limit, the dependency is removed and the job can move on to being run. If the limit cannot be satisfied, the dependency is not removed and no held jobs are released until another one of the association's currently running jobs finishes and cleans up.A couple of basic tests are added to
1034-mf-priority-max-nodes.t
to simulate submitting jobs that take up all of the association's node limit and having a job held due to their max nodes limit. Once the currently running job finishes, a test checks that the held job transitions to run.TODO