JohannesBuchner / UltraNest

Fit and compare complex models reliably and rapidly. Advanced nested sampling.
https://johannesbuchner.github.io/UltraNest/
Other
142 stars 30 forks source link

Running using SLURM #65

Closed shiningsurya closed 2 years ago

shiningsurya commented 2 years ago

Description

i am trying to run ultranest on HPC using SLURM. i submit a sbatch script requesting the allocation and the actual invocation is done by a line in the script: srun singularity .... python run_ultranest.py ...

The job crashes with this log:

*** An error occurred in MPI_Init_thread                                                   
*** on a NULL communicator                                                                 
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                   
***    and potentially your MPI job)                                                       
[hc201:95150] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!                                                                                         
srun: error: hc201: task 0: Exited with exit code 1                                        
srun: launch/slurm: _step_signal: Terminating StepId=664978.0 

i am able to run the script on my computer both inside and outside the container and with mpirun.

What I Did

not sure what i can do.

thanks in advance