Add the ability to specify which MPI launcher to use

FrankD412 commented 6 years ago

@gonsie mentioned this issue previously, but I wanted to make note of it since this came up again when using the Flux-spectrum adapter. This issue addresses the ability to specify a different MPI launcher other than a schedulers default (say for SLURM, srun etc.).

The FluxSpectrum adapter handles this by requiring the mpi key in its batch parameters. I propose that this key, along with something like mpi_args become required batch keys. An adapter can then index into a standard set of MPI objects to get the appropriate launcher format (and validate args). The mpi key could even be optional, with each specific scheduling adapter specifying its preferred default.

gonsie commented 6 years ago

I like the idea of a preferred default.

FrankD412 commented 6 years ago

More sanity notes:

Python packaging is possible using a MANIFEST.in file allowing for recipes to be specified in some other format than class (#71). documentation: http://python-packaging.readthedocs.io/en/latest/non-code-files.html

The YAML format would look something like the following:

srun:
header:
    nodes: #SBATCH -N {nodes},
    queue: #SBATCH -p {queue},
    bank: #SBATCH -A {bank},
    walltime: #SBATCH -t {walltime},
    job-name: #SBATCH -J {job-name},
    comment: #SBATCH --comment "{comment}"
parameters:
    ntasks: -n
    nodes: -N,
    reservation: --reservation
    cores per task: -c

The exposed interface to the SchedulerScriptAdapter that the StepRecord uses shouldn't have to change because it refers to the currently exposed interface simply to write scripts, submit, check statuses, and cancel. The recipes would be used internally to substitute for the _get_parallelize_command and other related methods.
Adapters can specify their own default flavor of MPI (for example, using SLURM is usually paired with srun). It would then simply be added to the YAML file without needing to modify code.
A nifty thing might be to replace the $(LAUNCHER) moniker with something like $(srun), $(mpirun), $(mpiexec), etc..

Some thoughts on the current recipe format mentioned above:

It lacks typing. Some parameters may need typing to denote lists of things or for things that are required to be ints, etc.
A section for specifying flag dependencies could be useful -- maybe ntasks requires nodes be specified (in this case it doesn't, but hypothetically).
Some commands may require math -- The LSF scheduler allows for resource sets to be specified (this would also require cluster information like processors per node, number of total nodes, etc.).

FrankD412 commented 6 years ago

Alright, I've been steadily working on this in the enhance/mpi_launchers branch. And here's what I have going so far (if you'd like to take a look at what's been changing, feel free to look at the branch and comment there).

I stuck with the original $(LAUNCHER)[*] notation. It would be too much of a hassle to change that because it piggy-backs on the same format as parameters and variables. So there's an addition of the "parallelizer" name. For example, a parallel call targeting mpirun with a single node and core it would be $(LAUNCHER)[1n,1p,mpirun].
In order to maintain flexibility, I've introduced the GeneralParallelizer that reads in the recipe YAML file and is meant to facilitate ease of addition of parallel command lines that adhere to easy to construct command lines (mpirun, srun, etc. vs. jsrun which has "resource sets"). Specialized Parallelizer classes can be implemented and returned via the ParallelizerFactory, which handles construction of commands that may require additional functionality for command line construction that cannot be handled by a generalized recipe.
The abstract SchedulerScriptAdapter previously handled processing of the $(LAUNCHER) token and associated sub-allocations. This responsibility has been shifted to the CommandParallelizer object via the parallelize method -- the method handles parsing out all matches for the launching token and substitutes the constructed commands using the ParallelizerFactory.
Allocation headers are now split into a separate recipe file from parallel commands. It became clear that storing both in the same location wasn't as useful as previously thought. Splitting the headers out allows for headers and parallel commands to simply be looked up via their command (sbatch for SLURM allocation and srun for SLURM parallelization).

Other thoughts:

The SchedulerScriptAdapter class can probably be retooled in much the same fashion as the Parallelizer class. A GeneralScriptAdapter class could be implemented to use recipes much like the generalized parallelization recipes mentioned above.

These are all the major thoughts I've had since starting to implement the enhancements related to this issue. I'll keep updating as I find other things.

FrankD412 commented 5 years ago

This issue is going to be somewhat complex to implement because it's going to need a hefty refactor. I'm moving it out of v1.1.4 release.

LLNL / maestrowf

Add the ability to specify which MPI launcher to use #110