Open gkaf89 opened 3 months ago
Maybe @riccardomurri can pitch in here, but since this seems to be specific to GC3Pie, I have little hope to see this fixed, especially since GC3Pie doesn't seem to be actively maintained anymore (we'll switch to Slurm as default job backend in the upcoming EasyBuild 5.0 because of that)
From the problem report, it seems that some environment variable PMIX_VERSION
is propagated to the build environment, where it conflicts with the OpenMPI being built. GC3Pie does not define that variable on its own, nor does it propagate the source environment, so I would look into shell startup scripts (e.g. does /etc/bashrc
load some module that loads openmpi?) or SLURM settings (does sbatch
propagate some environment variables?). In other words, I don't think this is specific to GC3Pie -- I would bet you'll get the same result on your cluster with the native SLURM backend.
But I haven't been able to work on GC3Pie in the last 4 years so this is likely all I can contribute here :-/
I tried to replicate the issue with the Slurm back end, but OpenMPI compiled without problems. This is a bit unexpected, we are debug further.
I believe the problem is similar to an open issue in the easyconfigs repository, where spurious definition of the PMIX_VERSION
environment variable causes the compilation of OpenMPI to fail.
I am trying to compile
OpenMPI-4.1.6-GCC-13.2.0.eb
with a GC3Pie job with EasyBuild 4.9.1. Everything works without issue apart from the compilation of MPI itself. The compilation fails with the message:The compilation works without issues when I create an allocation with
salloc
and build MPI in a local process in the allocation.Is this a known issue?
Configuration details
The configuration used for the build job is:
The contents of the
configuration/GC3Pie/iris_gpu_gc3pie.cfg
are:The target system in the GPU partition of the Iris computer at the University of Luxembourg.