Open victorsndvg opened 6 years ago
@victorsndvg it is not clear to me how mpi type could exist standalone, I mean: How it would be used with an SRUN/SBATCH job, without container?
Taking as a base point that workload manager and software are mandatory. What I mean is not the MPI library, but the MPI process manager. One can use mpirun
, mpiexec
, (from intelmpi, openmpi, mpich, etc.) or srun (from slurm) as process manager inside a sbatch script or not.
The orchestrator can take advantage of containers to provide portable workflows, but it can also use locally installed software from the HPC modules system.
A sequential job does not need to spawn processes (don't need mpirun
). The following example illustrate it:
...
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
singularity exec opm.simg flow input output
A parallel MPI job needs to be called by a process manager (e.g. mpirun
). The tool to run can be containerized or not. The following example illustrate it:
...
#SBATCH -N 2
#SBATCH -n 48
#SBATCH --ntasks-per-node=24
module load gcc openmpi paraview
mpirun paraview --serve inputdata
In addition, other future important point is the support of other container technologies. Usage of mpirun
within a docker container is different.
With singularity: mpirun singularity ...
With docker: docker run mpirun ...
how mpi flags woud be attached to a sbatch script?
Really don't know if I understand the question. I figure out you mean in the blueprint example of the first post.
This example:
mpi:
type: hpc.nodes.MPIJob
properties:
job_options:
modules:
- openmpi
...
flags:
- '--mca orte_tmpdir_base /tmp '
- '--mca pmix_server_usock_connections 1'
relationships:
- type: contained_in #job_managed_by_wm
target: computational_resources
Should be translated like this (e.g inside a sbatch script):
module load openmpi
mpirun --mca orte_tmpdir_base /tmp --mca pmix_server_usock_connections 1 ...
This particular example avoids the dependence of the MPI containers with the /scratch
directory at FT2
Instead of adding generic flags
job_option, it can be mapped to other options like tmpdir
, etc. But every vendor has its own flags and it is a hard work to identify and map all of them
JobTypes can currently describe how the applications are called depending on the underlying (software) layers used for running them.
The involved layers we have identified are (based on our experience):
Slurm
)Openmpi
)Singularity
)echo hola
)Currently, workload managers types like
Slurm
orTorque
describe how a batch script is submitted. In addition, aSingularityJob
type describes how to call a singularity container inside workload manager. In particular, aSingularityJob
type always usempirun
as a process manager, but I think is not always mandatory (e.g. a sequential singularity job). Finally, a particular software call is also expressed through acommand
job option.In my opinion, the top (workload manager) and bottom ("software-call") layers of the hierarchy are mandatory. I think the other layers could be optional and interrelated in order to express (in a more flexible way) the requirements and execution mode of every application. As a suggestion, a
contained_in
relationship could express this kind of hierarchical structures.As a(n) (simplified) example, the following options could be acceptable:
srun command
srun [mpirun] command
srun [singularity] command
srun [mpirun] [singularity] command
This flexibility could be expressed by a blueprint like the following example:
Even if we though about other container solutions (docker), or single-node parallel jobs, we can interchange MPI+container-
contained_in
relationships, avoiding the MPI-vendor-and-version matching inside and outside the container in some cases:srun [singularity | docker] [mpirun] command
To think