SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
The existing implementation needs a flux "Jobspec", describing what is to be run and its resource needs, to submit the job to flux. It obtains this using a flux python interface JobspecV1.from_command() which requires the number of nodes, tasks, etc. be specified as arguments.
This in turn requires the flux launcher to parse the command line the user provided, parsing FLUX arguments, to get the number of nodes, tasks, etc.
Instead of re-implementing 'flux mini run' arg parsing, run the command line via Popen.subcommand() with additional flux option "--dry-run". Flux responds with the Jobspec we need, and eliminates the need for an argument parser.
Also add a TODO indicating down_nodes may not be excluded, which seems not to to be supported.
The existing implementation needs a flux "Jobspec", describing what is to be run and its resource needs, to submit the job to flux. It obtains this using a flux python interface JobspecV1.from_command() which requires the number of nodes, tasks, etc. be specified as arguments.
This in turn requires the flux launcher to parse the command line the user provided, parsing FLUX arguments, to get the number of nodes, tasks, etc.
Instead of re-implementing 'flux mini run' arg parsing, run the command line via Popen.subcommand() with additional flux option "--dry-run". Flux responds with the Jobspec we need, and eliminates the need for an argument parser.
Also add a TODO indicating down_nodes may not be excluded, which seems not to to be supported.