PolyChord / PolyChordLite

Public version of PolyChord: See polychord.co.uk for PolyChordPro
https://polychord.io/
Other
84 stars 26 forks source link

Impossible to run PolyChord on recent Cray machines #121

Open amandinelebrun opened 6 months ago

amandinelebrun commented 6 months ago

Good afternoon,

When I try to run PolyChord on Cray machines using the Cray compilers (through Cobaya and the python wrapper), I get the following error:

MPICH ERROR [Rank 0] [job id 698426.0] [Mon Mar  4 12:09:02 2024] [c1385] - Abort(1092879) (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(170)...: 
MPID_Init(486)..........: 
MPIDU_Init_shm_init(184): unable to attach to shared memory

The supercomputer IT support has looked into the issue and narrowed down the origin to the PolyChord source code. The modules loaded are:

Currently Loaded Modules:
  1) craype-x86-genoa     5) libfabric/1.15.2.0    9) craype/2.7.30
  2) craype-network-ofi   6) cray-dsmml/0.2.2     10) perftools-base/23.12.0
  3) PrgEnv-gnu/8.5.0     7) cray-mpich/8.1.28    11) GCC-CPU-3.0.0
  4) gcc-native/12.1      8) cray-libsci/23.12.5  12) cray-python/3.11.5

Do you have any advice? I have already tried compiling with the intel and GNU compilers to no avail. The error messages are different in each case but the end result is the same.

yallup commented 6 months ago

I get a similar (although not exactly the same) message if I compile with MPI support on the Cambridge intel compiler clusters and run without mpirun wrapping the command. Perhaps try either wrapping the python command in mpirun -np 1 python xyz or compiling without MPI (MPI=0 in the makefiles), and see if that shows any progress?