Closed jedwards4b closed 1 year ago
@jedwards - the include reference here definitely looks off:
/opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpif90
It's pointing to the CCE wrapper and not the NVIDIA one. Can you check your config? (Spack compiler or MPI config perhaps?)
In the parallelio/package.py I have:
if spec.satisfies("+mpi"):
env["CC"] = spec["mpi"].mpicc
env["FC"] = spec["mpi"].mpifc
Is it possible the spec is pointing to the wrong mpi? How do I check that?
Yes - it appears that cmake is finding the wrong mpi wrappers:
-- Check for working C compiler: /opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The Fortran compiler identification is NVHPC 23.1.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpif90 - skipped
spack is setting crayclang in both -DCMAKE_INSTALL_RPATH and -DCMAKE_PREFIX_PATH
Thanks Jim - I think I see the issue... there's some environment stickiness going on with the Cray modules. See the following:
[17:10] ~$ module load nvhpc
Lmod is automatically replacing "cce/15.0.0" with "nvhpc/22.11".
Due to MODULEPATH changes, the following have been reloaded:
1) cray-libsci/22.11.1.2 2) cray-mpich/8.1.21 3) hdf5/1.12.2 4) ncarcompilers/0.7.2 5) netcdf/4.9.0
[17:11] ~$ echo $PATH | tr ':' '\n' | grep mpich
/opt/cray/pe/mpich/8.1.21/ofi/nvidia/20.7/bin
/opt/cray/pe/mpich/8.1.21/bin
/opt/cray/pe/mpich/8.1.21/ofi/cray/10.0/bin
I'll figure out a good way to get this working robustly and let you know. In the meantime, you should be able to avoid the problem by removing unnecessary modules before running Spack builds (if you do already let me know and we can explore how it's influencing your build-env).
I tried doing module purge prior to the spack install - but the LD_LIBRARY_PATH is still set:
module purge
The following modules were not unloaded:
(Use "module --force purge" to unload all):
1) ncarenv/22.12
[paralleliobld] jedwards@gust02:/glade/u/apps/cseg/spack> module list
Currently Loaded Modules:
1) ncarenv/22.12 (S)
Where:
S: Module is Sticky, requires --force to unload or purge
[paralleliobld] jedwards@gust02:/glade/u/apps/cseg/spack> env | grep mpich
LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.21/ofi/cray/10.0/lib
I tried
unset LD_LIBRARY_PATH
unset CRAY_LD_LIBRARY_PATH
unset SPACK_LD_LIBRARY_PATH
But still got the same error - it's also in PATH, but even after removing from PATH I get the same error.
@jedwards4b - LMK if you can still trigger this error when you get the chance.
I'm not seeing this in ncarenv/23.03
but
and