Whenever I submit a job with sbatch ... I typically obtain the job ID as output. I'd like to obtain that job ID using signac-flow.
This would enable complex submission workflows through something like the following snippet:
from flow import FlowProject
class Project(FlowProject):
pass
project = Project()
scheduler_job_ids = project.submit(...)
# Wait until the last one of the previous jobs have completed
more_job_ids = project.submit(..., after=scheduler_jobs_ids[-1])
I believe that one possible issue with this approach is that I'm not sure if all clusters behave the same. Some clusters might print other messages / info via stdout / stderr that would break the parsing.
The return value of the scheduler class (the part I linked above) would need to be forwarded through a series of calling functions to the return value of FlowProject.submit. I think it might be appropriate to return a list of job ids as strings, since FlowProject.submit can call sbatch (or a different scheduler command) multiple times.
To add this feature, here are the steps I would suggest:
Decide whether the FlowProject CLI (python project.py submit) should print the ids returned by the FlowProject.submit method.
Additional context
Another alternative would be to just return the raw captured stdout and leave it to the user to parse that information. In that case, FlowProject.submit would return a list of strings, each containing the raw output of one call to sbatch (instead of a list of strings of parsed job ids).
Feature description
Requested by user @salazardetroya: https://signac.slack.com/archives/CVC04S9TN/p1623794700095400
This would enable complex submission workflows through something like the following snippet:
Proposed solution
We used to (partially) support this kind of behavior for PBS/Torque clusters but we did not implement it for SLURM. If we chose to support this feature, we would need to implement it for all schedulers so that we have a consistent API. See here for the past implementation (removed in 0.12): https://github.com/glotzerlab/signac-flow/blob/29afbe3748019abd6a220a0b177e0ee1e853e8e6/flow/scheduling/torque.py#L149-L155
I believe that one possible issue with this approach is that I'm not sure if all clusters behave the same. Some clusters might print other messages / info via stdout / stderr that would break the parsing.
The return value of the scheduler class (the part I linked above) would need to be forwarded through a series of calling functions to the return value of
FlowProject.submit
. I think it might be appropriate to return a list of job ids as strings, sinceFlowProject.submit
can callsbatch
(or a different scheduler command) multiple times.To add this feature, here are the steps I would suggest:
_call_submit
return the captured output. This applies to all schedulers. https://github.com/glotzerlab/signac-flow/blob/9d4f1b459a1ef484852e040691da78c3ba7dee32/flow/scheduling/base.py#L162ComputeEnvironment
class to pass through the captured scheduler job id if submission occurs (instead ofJobStatus.submitted
, which could be inferred by the calling functions) andNone
if submission didn't run or failed. https://github.com/glotzerlab/signac-flow/blob/9d4f1b459a1ef484852e040691da78c3ba7dee32/flow/environment.py#L215-L217FlowProject._submit_operations
to pass through scheduler job ids, just like in the previous step. https://github.com/glotzerlab/signac-flow/blob/9d4f1b459a1ef484852e040691da78c3ba7dee32/flow/project.py#L3691-L3693FlowProject.submit
to return job ids (and continue to update the job/operation status on success, as interpreted by the result of the above method calls). https://github.com/glotzerlab/signac-flow/blob/9d4f1b459a1ef484852e040691da78c3ba7dee32/flow/project.py#L3782-L3791python project.py submit
) should print the ids returned by theFlowProject.submit
method.Additional context
Another alternative would be to just return the raw captured stdout and leave it to the user to parse that information. In that case,
FlowProject.submit
would return a list of strings, each containing the raw output of one call tosbatch
(instead of a list of strings of parsed job ids).