Open FrankD412 opened 6 years ago
I like the idea of a preferred default.
More sanity notes:
MANIFEST.in
file allowing for recipes to be specified in some other format than class (#71).
documentation: http://python-packaging.readthedocs.io/en/latest/non-code-files.html srun:
header:
nodes: #SBATCH -N {nodes},
queue: #SBATCH -p {queue},
bank: #SBATCH -A {bank},
walltime: #SBATCH -t {walltime},
job-name: #SBATCH -J {job-name},
comment: #SBATCH --comment "{comment}"
parameters:
ntasks: -n
nodes: -N,
reservation: --reservation
cores per task: -c
SchedulerScriptAdapter
that the StepRecord
uses shouldn't have to change because it refers to the currently exposed interface simply to write scripts, submit, check statuses, and cancel. The recipes would be used internally to substitute for the _get_parallelize_command
and other related methods.srun
). It would then simply be added to the YAML file without needing to modify code.$(LAUNCHER)
moniker with something like $(srun)
, $(mpirun)
, $(mpiexec)
, etc..Some thoughts on the current recipe format mentioned above:
ntasks
requires nodes
be specified (in this case it doesn't, but hypothetically).Alright, I've been steadily working on this in the enhance/mpi_launchers
branch. And here's what I have going so far (if you'd like to take a look at what's been changing, feel free to look at the branch and comment there).
$(LAUNCHER)[*]
notation. It would be too much of a hassle to change that because it piggy-backs on the same format as parameters and variables. So there's an addition of the "parallelizer" name. For example, a parallel call targeting mpirun
with a single node and core it would be $(LAUNCHER)[1n,1p,mpirun]
.GeneralParallelizer
that reads in the recipe YAML file and is meant to facilitate ease of addition of parallel command lines that adhere to easy to construct command lines (mpirun
, srun
, etc. vs. jsrun
which has "resource sets"). Specialized Parallelizer
classes can be implemented and returned via the ParallelizerFactory
, which handles construction of commands that may require additional functionality for command line construction that cannot be handled by a generalized recipe.SchedulerScriptAdapter
previously handled processing of the $(LAUNCHER)
token and associated sub-allocations. This responsibility has been shifted to the CommandParallelizer
object via the parallelize
method -- the method handles parsing out all matches for the launching token and substitutes the constructed commands using the ParallelizerFactory
.sbatch
for SLURM allocation and srun
for SLURM parallelization).Other thoughts:
SchedulerScriptAdapter
class can probably be retooled in much the same fashion as the Parallelizer
class. A GeneralScriptAdapter
class could be implemented to use recipes much like the generalized parallelization recipes mentioned above.These are all the major thoughts I've had since starting to implement the enhancements related to this issue. I'll keep updating as I find other things.
This issue is going to be somewhat complex to implement because it's going to need a hefty refactor. I'm moving it out of v1.1.4 release.
@gonsie mentioned this issue previously, but I wanted to make note of it since this came up again when using the Flux-spectrum adapter. This issue addresses the ability to specify a different MPI launcher other than a schedulers default (say for SLURM, srun etc.).
The FluxSpectrum adapter handles this by requiring the
mpi
key in its batch parameters. I propose that this key, along with something likempi_args
become required batch keys. An adapter can then index into a standard set of MPI objects to get the appropriate launcher format (and validate args). Thempi
key could even be optional, with each specific scheduling adapter specifying its preferred default.