Description
When running the slurm_par example, the runs step fails for each sample ran due to a slurm allocation issue. The following error is placed inside each runs.slurm.err file that's generated: srun: error: Only allocated 1 nodes asked for 2.
To Reproduce
Steps to reproduce the behavior:
Pull the slurm_par example with merlin example slurm_par
Cd into the slurm/ directory
Queue the tasks with merlin run slurm_par.yaml
Run the workers with merlin run-workers slurm_par.yaml
When it's done running look in the output directory at runs/00/runs.slurm.err to see the error
Expected behavior
We want two nodes allocated with slurm for this step.
Please answer these questions to help us pinpoint the problem
Does the problem occur in merlin run --local mode, distributed mode or neither? Distributed
If a distributed problem, which backend and queue servers are you using? How are they configured? Broker is rabbitmq, results backend is redis. Configured through LaunchIT
On what machines/architectures are you running merlin? Is this bug on a specific machine or can you reproduce it elsewhere? rztopaz and reproduced on ruby
Bug Report
Description When running the slurm_par example, the runs step fails for each sample ran due to a slurm allocation issue. The following error is placed inside each runs.slurm.err file that's generated:
srun: error: Only allocated 1 nodes asked for 2
.To Reproduce Steps to reproduce the behavior:
merlin example slurm_par
slurm/
directorymerlin run slurm_par.yaml
merlin run-workers slurm_par.yaml
runs/00/runs.slurm.err
to see the errorExpected behavior We want two nodes allocated with slurm for this step.
Please answer these questions to help us pinpoint the problem
merlin run --local
mode, distributed mode or neither? DistributedAdditional context Bug found by Casey Lamarche