Open giordano opened 1 month ago
Ah, the problem is that Julia doesn't start at all, I can see errors like
ERROR: Unable to load dependent library /data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12
Message:/data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12: undefined symbol: unw_ensure_tls
ERROR: Unable to load dependent library /data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12
ERROR: Unable to load dependent library /data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12
Message:/data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12: undefined symbol: unw_ensure_tls
ERROR: Unable to load dependent library /data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12
Message:/data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12: undefined symbol: unw_ensure_tls
Message:/data/cceamgi/julia-depot/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12: undefined symbol: unw_ensure_tls
On a different system I'm seeing the same outside of tests with Julia nightly:
$ ~/.julia/bin/mpiexecjl -np 1 --project julia +nightly -e ''
ERROR: Unable to load dependent library /home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12
Message:/home/mose/.julia/juliaup/julia-nightly/bin/../lib/julia/libjulia-internal.so.1.12: undefined symbol: unw_ensure_tls
┌ Error: The MPI process failed
│ proc = Process(setenv(`/home/mose/.julia/artifacts/62773cea33514bc12f48f228effadcb2ead6184a/bin/mpiexec -np 1 julia +nightly -e ''`,[...]), ProcessExited(1))
â”” @ Main none:7
I suspect this is a real issue with Julia v1.12
Ah, I understand the issue now, and I understand why JULIA_BINDIR
solved the issue in #858. TL;DR: the issue arises with mpiexecjl
when using juliaup with a channel different than the default one.
In https://github.com/JuliaParallel/MPI.jl/blob/780aaa0fdb768713a329659338a9c9cde23c41a8/bin/mpiexecjl#L54-L58 we run julia
assuming it's in PATH
(unless JULIA_BINDIR
is set), but if I try to run mpiexecjl ... julia +nightly
we're entering the script https://github.com/JuliaParallel/MPI.jl/blob/780aaa0fdb768713a329659338a9c9cde23c41a8/bin/mpiexecjl#L61-L70 with the default juliaup channel, setting up LD_LIBRARY_PATH
for that version of Julia, which breaks down when we then try to start the other julia process: if that's a different version of Julia we're mixing up libraries for different versions of Julia. This also explains why we don't have problems here in CI: we don't use juliaup (let alone mixing up different channels).
I'm really not sure we have a good solution for this besides setting JULIA_BINDIR
🤔 Should we parse julia +channel
specially in the script to deal with this? That'd complicate argument parsing quite a bit.
I have a system where the test introduced in #834 is failing:
I need to investigate what's wrong with this. For the record, this isn't specific to OpenMPI_jll, I see the same with MPICH_jll. I wonder if the problem is the shell, here
/bin/sh
is