NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

issue building with nvhpc/23.1 #40

Closed jedwards4b closed 1 year ago

jedwards4b commented 1 year ago
    274    cd /glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-build-cxe2v6a
            /src/flib && /opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpif90 -DCPRNVHPC -DLINUX -DLOGGING -DNO_C_SIZEOF -Dpiof_EXPORTS
             -I/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-build-cxe2v6a
             -I/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-src/src/flib 
            -I/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-build-cxe2v6a/
            src/flib -I/glade/u/apps/gust/22.12/spack/opt/spack/netcdf/4.9.0/packages/netcdf-fortran/4.6.0/cray-mpich/8.1.21/nvhpc/23.1/in
            clude -I/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-src/src/
            clib -I/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-src/src/c
            lib/../ncint -I/glade/u/apps/gust/22.12/spack/opt/spack/netcdf/4.9.0/packages/netcdf-c/4.9.0/cray-mpich/8.1.21/nvhpc/23.1/incl
            ude -I/glade/u/apps/gust/22.12/spack/opt/spack/parallel-netcdf/1.12.3/cray-mpich/8.1.21/nvhpc/23.1/include -O2 -gopt -fPIC -c 
            /glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cxe2v6aemkfhhg4brknnog7uagku7dbi/spack-src/src/flib/pio
            _kinds.F90 -o CMakeFiles/piof.dir/pio_kinds.F90.o
     275    NVFORTRAN-F-0004-Unable to open MODULE file mpi.mod (/glade/gust/scratch/jedwards/spack-stage/spack-stage-parallelio-2.5.10-cx
            e2v6aemkfhhg4brknnog7uagku7dbi/spack-src/src/flib/pio_kinds.F90: 12)
     276    NVFORTRAN/x86-64 Linux 23.1-0: compilation aborted
  >> 277    make[2]: *** [src/flib/CMakeFiles/piof.dir/build.make:120: src/flib/CMakeFiles/piof.dir/pio_kinds.F90.o] Error 2

but

  mpif90 --cray-print-opts=cflags
-I/opt/cray/pe/mpich/8.1.21/ofi/nvidia/20.7/include -I/opt/cray/pe/libsci/22.11.1.2/NVIDIA/20.7/x86_64/include -I/opt/cray/pe/pmi/6.1.7/include -I/opt/cray/pe/pals/1.2.4/include

and

 ls -l /opt/cray/pe/mpich/8.1.21/ofi/nvidia/20.7/include
total 423
-rw-r--r-- 1 root root    150 Oct 24 19:55 cray_version.h
-rw-r--r-- 1 root root 136214 Oct 24 19:55 mpi_base.mod
-rw-r--r-- 1 root root  87122 Oct 24 19:55 mpi_constants.mod
-rw-r--r-- 1 root root  20357 Oct 24 19:55 mpif.h
-rw-r--r-- 1 root root 142564 Oct 24 19:55 mpi.h
-rw-r--r-- 1 root root    352 Oct 24 19:55 mpi.mod
-rw-r--r-- 1 root root   1191 Oct 24 19:55 mpiof.h
-rw-r--r-- 1 root root  26388 Oct 24 19:55 mpio.h
-rw-r--r-- 1 root root  15600 Oct 24 19:55 mpi_sizeofs.mod
vanderwb commented 1 year ago

@jedwards - the include reference here definitely looks off:

/opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpif90

It's pointing to the CCE wrapper and not the NVIDIA one. Can you check your config? (Spack compiler or MPI config perhaps?)

jedwards4b commented 1 year ago

In the parallelio/package.py I have:

    if spec.satisfies("+mpi"):
            env["CC"] = spec["mpi"].mpicc
            env["FC"] = spec["mpi"].mpifc

Is it possible the spec is pointing to the wrong mpi? How do I check that?

jedwards4b commented 1 year ago

Yes - it appears that cmake is finding the wrong mpi wrappers:

-- Check for working C compiler: /opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpicc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The Fortran compiler identification is NVHPC 23.1.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/cray/pe/mpich/8.1.21/ofi/crayclang/10.0/bin/mpif90 - skipped
jedwards4b commented 1 year ago

spack is setting crayclang in both -DCMAKE_INSTALL_RPATH and -DCMAKE_PREFIX_PATH

vanderwb commented 1 year ago

Thanks Jim - I think I see the issue... there's some environment stickiness going on with the Cray modules. See the following:

[17:10] ~$ module load nvhpc

Lmod is automatically replacing "cce/15.0.0" with "nvhpc/22.11".

Due to MODULEPATH changes, the following have been reloaded:
  1) cray-libsci/22.11.1.2     2) cray-mpich/8.1.21     3) hdf5/1.12.2     4) ncarcompilers/0.7.2     5) netcdf/4.9.0

[17:11] ~$ echo $PATH | tr ':' '\n' | grep mpich
/opt/cray/pe/mpich/8.1.21/ofi/nvidia/20.7/bin
/opt/cray/pe/mpich/8.1.21/bin
/opt/cray/pe/mpich/8.1.21/ofi/cray/10.0/bin

I'll figure out a good way to get this working robustly and let you know. In the meantime, you should be able to avoid the problem by removing unnecessary modules before running Spack builds (if you do already let me know and we can explore how it's influencing your build-env).

jedwards4b commented 1 year ago

I tried doing module purge prior to the spack install - but the LD_LIBRARY_PATH is still set:

module purge
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) ncarenv/22.12
[paralleliobld] jedwards@gust02:/glade/u/apps/cseg/spack> module list

Currently Loaded Modules:
  1) ncarenv/22.12 (S)

  Where:
   S:  Module is Sticky, requires --force to unload or purge

[paralleliobld] jedwards@gust02:/glade/u/apps/cseg/spack> env | grep mpich
LD_LIBRARY_PATH=/opt/cray/pe/mpich/8.1.21/ofi/cray/10.0/lib

I tried

unset LD_LIBRARY_PATH 
unset CRAY_LD_LIBRARY_PATH
unset SPACK_LD_LIBRARY_PATH

But still got the same error - it's also in PATH, but even after removing from PATH I get the same error.

vanderwb commented 1 year ago

@jedwards4b - LMK if you can still trigger this error when you get the chance.

jedwards4b commented 1 year ago

I'm not seeing this in ncarenv/23.03