E4S-Project / e4s

E4S for Spack
https://e4s.readthedocs.io
MIT License
28 stars 13 forks source link

[support]: SLEPC link error in standard documented example on Cori #42

Closed kngott closed 2 years ago

kngott commented 2 years ago

Name of Software

SLEPC

Contact Details

kngott@lbl.gov

HPC System

Cori

Request Description

When attempting to build a slepc example on Cori under e4s/21.05, it looks like PETSC(?) attempts to like to arpack in the wrong directory.

The arpack directory is <.....>\lib64, but the link line includes <.....>\lib instead through the ${SLEPC_EPS_LIB} variable (which looks like it goes heavily through PETSC, to me).

Is this reproducible by you? Is it a simple fix?

Thanks!

Relevant log output

cc -o ex7.o -c -fPIC    -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/include      -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hypre-2.20.0-j7o7wnpnszxynxw2qrxgm3kms4x5tlgi/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/superlu-dist-6.4.0-2rrh7bg4m7nbakrokjfhmuv3m7cepzpm/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hdf5-1.10.7-v5mkktznov4jajcwmvr2ksmslm3wlmkz/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/parmetis-4.0.3-a5rqvobjpjxzzfewtotdp2evhpjpfmdj/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/metis-5.1.0-xt5uqhzgkutkb7fha5dlwocdf5yjxr2n/include -I/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/zlib-1.2.11-pjokgrf356medkbvu7wza3vjibv3loux/include    `pwd`/ex7.c
cc -fPIC  -o ex7 ex7.o  -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/lib -lslepc -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/arpack-ng-3.8.0-c77ia4duyxoyy5hbsuqdwqico5xybxav/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/arpack-ng-3.8.0-c77ia4duyxoyy5hbsuqdwqico5xybxav/lib -lparpack -larpack         -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hypre-2.20.0-j7o7wnpnszxynxw2qrxgm3kms4x5tlgi/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hypre-2.20.0-j7o7wnpnszxynxw2qrxgm3kms4x5tlgi/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/superlu-dist-6.4.0-2rrh7bg4m7nbakrokjfhmuv3m7cepzpm/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/superlu-dist-6.4.0-2rrh7bg4m7nbakrokjfhmuv3m7cepzpm/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/openblas-0.3.10-kg2tqkgdloomhpk73owhbxjntgpnc7go/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/openblas-0.3.10-kg2tqkgdloomhpk73owhbxjntgpnc7go/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hdf5-1.10.7-v5mkktznov4jajcwmvr2ksmslm3wlmkz/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/hdf5-1.10.7-v5mkktznov4jajcwmvr2ksmslm3wlmkz/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/parmetis-4.0.3-a5rqvobjpjxzzfewtotdp2evhpjpfmdj/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/parmetis-4.0.3-a5rqvobjpjxzzfewtotdp2evhpjpfmdj/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/metis-5.1.0-xt5uqhzgkutkb7fha5dlwocdf5yjxr2n/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/metis-5.1.0-xt5uqhzgkutkb7fha5dlwocdf5yjxr2n/lib -Wl,-rpath,/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/zlib-1.2.11-pjokgrf356medkbvu7wza3vjibv3loux/lib -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/zlib-1.2.11-pjokgrf356medkbvu7wza3vjibv3loux/lib -Wl,-rpath,/opt/cray/dmapp/default/lib64 -L/opt/cray/dmapp/default/lib64 -Wl,-rpath,/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/lib -L/opt/cray/pe/mpt/7.7.10/gni/mpich-intel/16.0/lib -Wl,-rpath,/opt/cray/rca/2.2.20-7.0.1.1_4.65__g8e3fb5b.ari/lib64 -L/opt/cray/rca/2.2.20-7.0.1.1_4.65__g8e3fb5b.ari/lib64 -Wl,-rpath,/opt/cray/pe/atp/2.1.3/libApp -L/opt/cray/pe/atp/2.1.3/libApp -Wl,-rpath,/usr/common/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/usr/common/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/common/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin -L/usr/common/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin -Wl,-rpath,/global/common/cori_cle7/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -L/global/common/cori_cle7/software/intel/parallel_studio_xe_2020_update4_cluster_edition/compilers_and_libraries_2020.4.304/linux/compiler/lib/intel64_lin -Wl,-rpath,/usr/lib64/gcc/x86_64-suse-linux/7 -L/usr/lib64/gcc/x86_64-suse-linux/7 -Wl,-rpath,/usr/x86_64-suse-linux/lib -L/usr/x86_64-suse-linux/lib -lpetsc -lHYPRE -lsuperlu_dist -lopenblas -lhdf5hl_fortran -lhdf5_hl -lhdf5_fortran -lhdf5 -lparmetis -lmetis -lz -lintlc -lstdc++ -ldl -lrca -lAtpSigHandler -lAtpSigHCommData -lhugetlbfs -lmpich_intel -lmpichf90_intel -limf -lm -lpthread -lifport -lifcoremt_pic -lsvml -lipgo -lirc -lgcc_s -lirc_s -lstdc++ -ldl
/usr/bin/ld: cannot find -lparpack
/usr/bin/ld: cannot find -larpack
make: [Makefile:24: ex7] Error 1 (ignored)
/usr/bin/rm -f ex7.o

Reproduce Bug

This is reproduced with this example: https://slepc.upv.es/handson/handson3.html

1. Download/write the ex7.c and makefile, from the docs.
2. module load e4s petsc slepc
3. make -F makefile ex7 (or however you prefer).
...

Can be fixed by adding (note the lib64):
LINKFLAGS  = -L/global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/arpack-ng-3.8.0-c77ia4duyxoyy5hbsuqdwqico5xybxav/lib64

and editing the build rule to:
-${CLINKER} -o ex7 ex7.o ${LINKFLAGS} ${SLEPC_EPS_LIB}
wspear commented 2 years ago

I may be missing a step here, but module load e4s petsc slepc does not populate SLEPC_EPS_LIB for me on cori, so this build fails just because slepceps.h cannot be found. The slepc and petsc module activity is pasted below. Is there something else I need to do to match the environment where you encountered this issue?

(base) wspear@cori12:~/SPACK-SPACE/slepc> module show petsc
-------------------------------------------------------------------
/global/common/software/spackecp/e4s-21.05/modules/cray-cnl7-haswell//petsc/3.15.0-intel-19.1.3.304:

module-whatis    PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.
conflict         petsc
prepend-path     LD_LIBRARY_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/lib
prepend-path     PKG_CONFIG_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/lib/pkgconfig
prepend-path     CMAKE_PREFIX_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa/
setenv           PETSC_DIR /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa
unsetenv         PETSC_ARCH
setenv           PETSC_ROOT /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/petsc-3.15.0-7szducdooc2vunba2wulyyeytatnwqpa
-------------------------------------------------------------------

(base) wspear@cori12:~/SPACK-SPACE/slepc> module show slepc
-------------------------------------------------------------------
/global/common/software/spackecp/e4s-21.05/modules/cray-cnl7-haswell//slepc/3.15.0-intel-19.1.3.304:

module-whatis    Scalable Library for Eigenvalue Problem Computations.
conflict         slepc
prepend-path     LD_LIBRARY_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/lib
prepend-path     PKG_CONFIG_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/lib/pkgconfig
prepend-path     CMAKE_PREFIX_PATH /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq/
setenv           SLEPC_DIR /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq
setenv           SLEPC_ROOT /global/common/software/spackecp/e4s-21.05/software/cray-cnl7-haswell/intel-19.1.3.304/slepc-3.15.0-z4uvitsfha4jmcwgjttvo63rca6zq5mq
-------------------------------------------------------------------
kngott commented 2 years ago

Ah: the full Makefile isn't listed in that example: "Add these lines to the Makefile". They must assume you're doing these in order. The first one has it the required extra line: https://slepc.upv.es/handson/handson0.html

SLEPC_EPS_LIB variable is a Makefile variable populated in through that include.

Does that work for you?

wspear commented 2 years ago

Yep, now I see it. The issue is in /lib/slepc/conf/slepcvariables where ARPACK_LIB is set to the wrong directory. I also see this in the slepc 3.16 included with e4s 21.11. I suspect it will also occur in the slepc 3.16.1 slated for inclusion with e4s 22.02. I'm not 100% sure of that last one since for some reason arpack libs are installed in lib instead of lib64 on the system where I'm testing 3.16.1, so slepc happens to have the correct directory listed in that case.

I'm tagging the slepc package maintainers @joseeroman @balay in hopes we can get a patch or a new version in the package that fixes this.

kngott commented 2 years ago

Awesome! Thank you so much!

joseeroman commented 2 years ago

Can you paste the SLEPc logs? I think they are in spack-build-out.txt buried in the /opt/spack directory, assuming it has been built with spack

wspear commented 2 years ago

@joseeroman The stage directory is normally deleted after a successful install (and I'm not sure I would have access to the NERSC systems stage directories anyway) but I was able to do my own build on perlmutter (which exhibits the same issue). Output logs are available here: http://yu.nic.uoregon.edu/~wspear/slepc-build-perlmutter/

joseeroman commented 2 years ago

Thanks. You need to fix the permissions, otherwise I cannot see the files

wspear commented 2 years ago

They should be world-readable now.

joseeroman commented 2 years ago

What are the contents of /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/slepc-3.16.0-qerj56bfhsgicdm5ttcf2avtxl5mnvqa/lib/slepc/conf/slepcvariables ?

wspear commented 2 years ago

I've added that file here: http://yu.nic.uoregon.edu/~wspear/slepc-build-perlmutter/slepcvariables

joseeroman commented 2 years ago

I don't see anything wrong in the files. Maybe what is wrong is the makefile you are using to build the example. Can you show the contents of the makefile and the actual error that you get?

wspear commented 2 years ago

If you look at slepcvariables you will see that the directory pointed to by ARPACK_LIB is: /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib

The issue is that this location does not exist. It should be pointing to lib64, which is what you see for the arpack_ng library directory in SPACK_LINK_DIRS and SPACK_RPATH_DIRS in spack-build-env.txt. Presumably, whatever logic is building ARPACK_LIB is just adding 'lib' to the arpack install prefix and not checking if the directory is actually named lib or lib64.

joseeroman commented 2 years ago

I don't understand. SLEPc's configure first tries a directory ending with lib, then if it fails it tries with lib64. According to configure.log the first try was successful, so it did not continue to try lib64.

wspear commented 2 years ago

There must be something amiss with that logic...

wspear@perlmutter:login34:~> ls /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib
ls: cannot access '/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib': No such file or directory
wspear@perlmutter:login34:~> ls /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64
cmake  libarpack.so  libarpack.so.2  libarpack.so.2.1.0  libparpack.so  libparpack.so.2  libparpack.so.2.1.0  pkgconfig
balay commented 2 years ago

I suspect the issue is:

So even though stuff is in pkgdir/lib64 - configure first checks pkgdir/lib and that succeeds [due to spack compiler magic]

So the fix could be:

I think I had such fixes in petsc spack receipie [but thats a bit convoluted]. It relies on querying dependent package to get this info [but not all pkgs appear to provide this info in spack - so had to use --with-pkg-dir for such pkgs]

balay commented 2 years ago

Untested - likely fix

diff --git a/var/spack/repos/builtin/packages/slepc/package.py b/var/spack/repos/builtin/packages/slepc/package.py
index 0ac383bd38..750a360ca4 100644
--- a/var/spack/repos/builtin/packages/slepc/package.py
+++ b/var/spack/repos/builtin/packages/slepc/package.py
@@ -114,7 +114,8 @@ def install(self, spec, prefix):
         options = []
         if '+arpack' in spec:
             options.extend([
-                '--with-arpack-dir=%s' % spec['arpack-ng'].prefix,
+                '--with-arpack-include=%s' % spec['arpack-ng'].prefix.include,
+                '--with-arpack-lib=%s' % spec['arpack-ng'].libs.joined(),
             ])
             if spec.satisfies('@:3.12'):
                 arpackopt = '--with-arpack-flags'
joseeroman commented 2 years ago

Thanks Satish. Does spec['arpack-ng'].libs.joined() return a comma-separated list?

wspear commented 2 years ago

I just tested this out. Still seeing lib instead of lib64 in ARPACK_LIB. The config line from build output looks like:

==> [2022-01-27-11:08:00.102309] '/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/python-3.9.9-v7zlsbuv6k52fmhxcoqxizfob7c7h4bx/bin/python3.9' 'configure' '--prefix=/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/slepc-3.16.0-qerj56bfhsgicdm5ttcf2avtxl5mnvqa' '--with-arpack-dir=/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6' '--with-arpack-include=/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/include' '--with-arpack-lib=/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libparpack.so /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libarpack.so' '--with-arpack-lib=-lparpack,-larpack'

Edit: note that the package already adds --with-arpack-lib just after the patched lines as long as the version is above 3.12.

joseeroman commented 2 years ago

One should either use --with-arpack-dir or the other two. I should modify SLEPc's configure so that it complains if both are given.

balay commented 2 years ago

I guess one issue is - the configure interface to specify arpack is different in different slepc versions..

Does spec['arpack-ng'].libs.joined() return a comma-separated list?

From above - its not.. But this notation appears to work with latest slepc.

'--with-arpack-lib=/global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libparpack.so /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libarpack.so'

@wspear can you try ~https://github.com/spack/spack/commit/2b7ae8b3a2a18cbef8b3fa9308c6fce74a007d06~ https://github.com/spack/spack/commit/0ed93f1c917627a9c4a5a0444a94a14d924e5cb3

wspear commented 2 years ago

So, the good news is with that patch we have: ARPACK_LIB = /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libparpack.so /global/u2/w/wspear/PERLMUTTER/spack/opt/spack/cray-sles15-zen2/gcc-9.3.0/arpack-ng-3.8.0-uzmngov7rnizckd3ingwd7b7hsntmcw6/lib64/libarpack.so

It's all good news. The test works with this install. I just had to swap to PrgEnv-gnu.

The bad news is, the test application won't build on Perlmutter for unrelated reasons so I need to find another venue to build and test that patch to absolutely confirm that this ARPACK_LIB, with the full library file path instead of rpath/L/l arguments works (though I assume it does).

On perlmutter building the test app results in numerous type errors such as:

/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include/emmintrin.h: In function ‘_mm_storer_pd’:
/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/include/emmintrin.h:2080:7: error: incompatible types when assigning to type ‘__m128d {aka __vector(2) double}’ from type ‘int’
   __a = __builtin_shufflevector((__v2df)__a, (__v2df)__a, 1, 0);
   ^
balay commented 2 years ago

I have additional fixes [to fix the logic for older slepc versions] - and have the PR at https://github.com/spack/spack/pull/28654

wspear commented 2 years ago

Resolved by PR's from Jan 27.