Open jennydaman opened 2 years ago
number_of_workers can be a way to support embarrassingly parallel jobs on multi-node compute environments.
number_of_workers
How can a process identify which replicate it is? It is necessary to know so the workfload can be divided, e.g. in plugin code:
if WORKER_NUMBER == 1: process('1.png') elif WORKER_NUMBER == 2: process ('2.png') ....
The equivalent concept in SLURM is a job array.
https://slurm.schedmd.com/job_array.html
e.g.
sbatch --job-array=1-4 job.sh
Four instances of job.sh will be executed, possibly on different compute nodes, and each instance will have an environment variable set SLURM_ARRAY_JOB_ID as 1, 2, 3, or 4.
job.sh
SLURM_ARRAY_JOB_ID
1
2
3
4
pman should do something similar.
pman
number_of_workers
can be a way to support embarrassingly parallel jobs on multi-node compute environments.How can a process identify which replicate it is? It is necessary to know so the workfload can be divided, e.g. in plugin code:
The equivalent concept in SLURM is a job array.
https://slurm.schedmd.com/job_array.html
e.g.
Four instances of
job.sh
will be executed, possibly on different compute nodes, and each instance will have an environment variable setSLURM_ARRAY_JOB_ID
as1
,2
,3
, or4
.pman
should do something similar.