Operating System: running in a singularity container
Description
i am trying to run ultranest on HPC using SLURM.
i submit a sbatch script requesting the allocation and the actual invocation is done by a line in the script:
srun singularity .... python run_ultranest.py ...
The job crashes with this log:
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[hc201:95150] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
srun: error: hc201: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=664978.0
i am able to run the script on my computer both inside and outside the container and with mpirun.
Description
i am trying to run
ultranest
on HPC using SLURM. i submit asbatch
script requesting the allocation and the actual invocation is done by a line in the script:srun singularity .... python run_ultranest.py ...
The job crashes with this log:
i am able to run the script on my computer both inside and outside the container and with
mpirun
.What I Did
not sure what i can do.
thanks in advance