Open thomas-robinson opened 2 months ago
Hi Thomas, the fact that the created profile's name isn't specifying a mpi vendor ("[+] Created profile gfdl2024.01") indicates that e4s-cl failed to find either libmpi.so.12, libmpi_cray.so.12 or libmpi.so.40. e4s-cl will try to locate any of these three and will name the newly created profile correspondingly.
Could you check if the correct libmpi.so is in your LD_LIBRARY_PATH?
The idea behind the init
command is to understand what the MPI environment is and save the detected configuration to avoid computing it everytime.
This is done using a python script to access an MPI library from the environment, load and use well-known symbols to run basic operations to ensure it is working properly and loads all the library it needs to function (As they can sometimes lazy-load libraries).
You can see this in action here:
[Debug e4s_cl.util:211] Running with parent status: ['/apps/slurm/default/bin/srun', '/scratch2/GFDL/e4s/bin/conda/bin/python', '/scratch2/GFDL/e4s/bin/bin/e4s-cl', 'profile', 'detect', '/scratch2/GFDL/e4s/bin/conda/bin/e4s-cl-mpi-tester']
Failed to determine necessary libraries: program exited with code 156
You can see how this is done here. Intel MPI is treated as MPICH as they share ABI and sonames.
As Frederick suggested, something is preventing the proper analysis of your MPI environment. Please share the contents of the created profile and, if possible, compile a sample MPI program with this environment and share the output of ldd
on it. What often happens is either a RPATH or an arbitrary soname is going against the MPI standard practices, and e4s-cl cannot adjust for that.
If you can, try running that tester script in your desired MPI environment and see if it gives you any information about what is failing /scratch2/GFDL/e4s/bin/conda/bin/e4s-cl-mpi-tester
.
While trying to run an
e4s-cl init
I received an error that said it was an e4s-cl bug, and to report the contents of a debug file on github. Below is the pasted contents of the file:Here are the modules I have loaded:
My container is using gcc 13 and mpich installed with spack.