JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
376 stars 121 forks source link

MPIPreferences fails with vendor="cray" #849

Open omlins opened 1 month ago

omlins commented 1 month ago

Error message:

julia> using MPIPreferences; MPIPreferences.use_system_binary(mpiexec="srun", vendor="cray")
Error:
Compiling x86_64 targets is not supported on aarch64 hosts.

ERROR: failed process: Process(`cc --cray-print-opts=all`, ProcessExited(255)) [255]

Stacktrace:
 [1] pipeline_error
   @ ./process.jl:565 [inlined]
 [2] read(cmd::Cmd)
   @ Base ./process.jl:449
 [3] read
   @ ./process.jl:458 [inlined]
 [4] readchomp
   @ ./io.jl:974 [inlined]
 [5] analyze_cray_cc()
   @ MPIPreferences.CrayParser /capstor/scratch/cscs/omlins/julia_local/julia_depot/packages/MPIPreferences/PLH7x/src/parse_cray_cc.jl:67
 [6] use_system_binary(; library_names::Vector{…}, extra_paths::Vector{…}, mpiexec::String, abi::Nothing, vendor::String, export_prefs::Bool, force::Bool)
   @ MPIPreferences /capstor/scratch/cscs/omlins/julia_local/julia_depot/packages/MPIPreferences/PLH7x/src/MPIPreferences.jl:180
 [7] top-level scope
   @ REPL[1]:1
Some type information was truncated. Use `show(err)` to see complete types.

Loaded modules:

[todi][omlins@nid007359 codes]$ module list

Currently Loaded Modules:
  1) craype-x86-rome                        8) cudatoolkit/23.9_12.2
  2) libfabric/1.15.2.0                     9) gcc-native/12.3
  3) craype-network-ofi                    10) craype/2.7.30
  4) xpmem/2.8.2-1.0_3.7__g84a27a5.shasta  11) cray-mpich/8.1.28
  5) perftools-base/23.12.0                12) cray-libsci/23.12.5
  6) cpe/23.12                             13) PrgEnv-gnu/8.5.0
  7) cray/23.12
JBlaschke commented 1 month ago

progress report from Slack: adding -target-accel=nvidia90 -target-cpu=aarch64 makes the cray compiler wrapper behave -- now just thinking about how to best set this. It's a bit of a chicken + egg thing: need to know the accel type to get the compiler wrapper to tell you what the accelerator's gtl library is called

urgh....

CRRRRRRAAAAAAAAAAYYYYYY!!!!

JBlaschke commented 1 month ago

Ok maybe not all Cray's fault -- the problem is that the compiler wrappers are looking for theCRAY_ACCEL_TARGET and CRAY_CPU_TARGET env vars -- which are normally set. Just not on Alps at the moment. So I forgot about these. Most sites provide a module (craype-accel-nvidia and craype-accel-nvidia80 on Perlmutter) -- and often load it by default.

Leaving this issue open to remind me to write some env checks for those vars, and if they are not set, present the user with a sensible error message.