cmhamel / Exodus.jl

A julia interface for accessing the ExodusII data format
MIT License
9 stars 0 forks source link

Exodus inside MPI fails #174

Open PeriHub opened 7 months ago

PeriHub commented 7 months ago

Hi there, if I run this MPI example and use the Exodus Package, MPI is crashing

using MPI
using Exodus

MPI.Init()

comm = MPI.COMM_WORLD
println("Hello world, I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))")
MPI.Barrier(comm)

mpiexec -n 3 julia --project script.jl

Error:

--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------

I'm pretty sure that the package used to work with MPI, but I also tried older releases of the package. Maybe I'm missing something. How can I keep using Exodus.

cmhamel commented 7 months ago

Can you let me know the version numbers of MPI, Exodus, julia, NetCDF_jll and HDF5_jll that'd being used in this example?

PeriHub commented 7 months ago

And I forgot to mention, if I put using Exodus below MPI.Init() it don't crash. Thank you for the quick response!

cmhamel commented 7 months ago

There could be conflicting mpi versions then potentially. Have you tried running the example with mpiexecjl rather than mpiexec? See the MPI.jl docs if you're not sire what I mean.

JTHesse commented 7 months ago

Yes you could be right, mpiexecjl is indeed working, but unfortunately we can't use it on our HPC. Is there maybe another solution?

cmhamel commented 7 months ago

I'm no MPI expert but I think the solution is to rebuild Exodus_jll.jl locally built against your system MPI. This will likely involve modifying the build_tarballs.jl file found for Exodus in Yggdrasil https://github.com/JuliaPackaging/Yggdrasil

I don't think building jll packages with BinaryBuilder with MPI is well documented, but there are examples that can be followed, such as Trilinos or HDF5.

Alternatively, you can bypass Exodus_jll all together and build exodus locally for the SEACAS github page and then link things appropriately in a fork of Exodus.jl.

I'm not sure if there's a better solution.

JTHesse commented 7 months ago

Thank you I will look into that.