cyclops-community / ctf

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
Other
194 stars 53 forks source link

compile issues with undefined references to mkl commands (that appear in the relevant folders) #147

Open dgrin1 opened 1 year ago

dgrin1 commented 1 year ago

I'm trying to install ctf on a linux box, first loading my openmpi module, then running this config command ./configure CXX=mpicxx --build-scalapack --build-hptt --with-hptt --with-scalapack --with-lapack 'LD_LIB_PATH=-L/opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64 -Wl,--no-as-needed' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_def -liomp5 -lpthread -lm -ldl' --install-dir=~/lib/

Then removing all -ipo commands (they were causing compiling issues), then running make, and then make install. This all works, fine, but when I try to run make test or make testpython, the compiler crashes and gives errors liek these: erbla': pxerbla.f:(.text+0x66): undefined reference to for_write_seq_fmt' pxerbla.f:(.text+0x7f): undefined reference tofor_write_seq_fmt_xmit' pxerbla.f:(.text+0x9a): undefined reference to for_write_seq_fmt_xmit' pxerbla.f:(.text+0xb3): undefined reference tofor_write_seq_fmtxmit' /home/dgrin/ctf/scalapack/build/lib/libscalapack.a(pjlaenv.f.o): In function `pjlaenv': pjlaenv.f:(.text+0x3e): undefined reference to `for_cpystr'

I was able to get much further on my OS laptop, but the svd tests would not work there. Any tips?

solomonik commented 1 year ago

This looks like an issue with scalapack symbols. Running configure with --build-scalapack will download and build ScaLAPACK at configure time, but you are also providing MKL ScaLAPACK symbols. One or the other should be workable but not both together. I also cannot comment on the specific list of MKL symbols needed, as these are architecture dependent (would recommend MKL link line advisor, likely you are already using that). Also configuring with --no-static or --no-dynamic will build only dynamic (necessary for Python) or only static (sufficient for C++ without dynamic linking) libraries, which may simpplify things and avoid errors.

dgrin1 commented 1 year ago

Thanks, that seems to help, but then this error message rears its head,

/Users/dgrin/ctf/scalapack/BLACS/SRC/igsum2d_.c:153:7: error: implicit declaration of function 'BI_imvcopy' is invalid in C99 [-Werror,-Wimplicit-function-declaration] BI_imvcopy(Mpval(m), Mpval(n), A, tlda, bp->Buff);

I'm guessing this is a simple swap to a different compiler, but I'm not sure how to proceed.

solomonik commented 1 year ago

I would recommend to make sure ScaLAPACK built correctly when CTF's configure executed and to try to run some of the tests included in the ScaLAPACK package. There may be issues with missing libraries or the BLAS/MKL/OpenBLAS library ScaLAPACK is trying to use. You may ultimately need to build it separately (or can try the MKL ScaLAPACK). CTF also builds without ScaLAPACK, but will not include functionality related to interfacing to ScaLAPACK routines for solving systems of equations and eigenvalue computations, all sparse/dense tensor contraction functions should work without it.

dgrin1 commented 1 year ago

I was able to avoid this problem by running configure after installing fresh lapack, openblas, and scalapack, and leaving scalapack build off the configure. I was then able to run make, and make install. Now when I run my python calculation that uses ctf it behaves until I try to an SVD, at which point it seg faults (with an 8X8 matrix and no nans in it).

When I try to generate make test, make python_test, or make svd, I get lots warnings about "Undefined symbols for architecture arm64: "_Cblacs_barrier", referenced from: CTF_SCALAPACK::cblacs_barrier(int, char*) in libctf.a(lapack_symbs.o)" (many like it).

Ultimately, these makes crash of the form, clang: error: linker command failed with exit code 1 (use -v to see invocation) make[1]: *** [/Users/dgrin/ctf/bin/test_suite]

I think somehow the libraries are only partially pluigged in at this point.

Thoughts?

solomonik commented 1 year ago

This is expected, using SVD requires Scalapack.

dgrin1 commented 1 year ago

Sorry if I was unclear - what I meant to say was that I built scalapack, separately but still included it via --with-scalapack in the config run. So in principle, SVD should work, no?

solomonik commented 1 year ago

I see, then yes, understood your post as leaving off scalapack altogether. Please send output from configure and make python_test.

(accidentally clicked close and post, reopened)

dgrin1 commented 1 year ago

I tried two approaches -- A) Approach A - local build of scalapack

./configure CXX=mpicxx --build-scalapack --with-scalapack --with-lapack --install-dir=~/lib/

Then the configure gave the output

Checking compiler type/version... Using Intel compilers. Checking whether APPLE is defined... no. Checking compiler (CXX)... successful. Checking flags (CXXFLAGS)... successful. Checking availability of C++11... successful. Checking for MPI... MPI works. Checking for OpenMP... OpenMP works. Checking for static BLAS library... detected that -mkl works, speculatively using -mkl. Checking for availability of static batched gemm... available, will build with -DUSE_BATCH_GEMM. Checking for dynamic BLAS library... detected that -mkl works, speculatively using -mkl. Checking for availability of dynamic batched gemm... Checking for static LAPACK library... static LAPACK found. Checking for dynamic LAPACK library... dynamic LAPACK found. Checking for sparse MKL routines... sparse MKL found. Building ScaLAPACK using cmake (no options passed along)... --2022-10-11 01:04:23-- http://www.netlib.org/scalapack/scalapack-2.1.0.tgz Resolving www.netlib.org (www.netlib.org)... 160.36.131.221 Connecting to www.netlib.org (www.netlib.org)|160.36.131.221|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://netlib.org/scalapack/scalapack-2.1.0.tgz [following] --2022-10-11 01:04:23-- https://netlib.org/scalapack/scalapack-2.1.0.tgz Resolving netlib.org (netlib.org)... 160.36.131.221 Connecting to netlib.org (netlib.org)|160.36.131.221|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 5307441 (5.1M) [application/x-gzip] Saving to: ‘scalapack.tgz’

scalapack.tgz 100%[===================>] 5.06M 10.9MB/s in 0.5s

2022-10-11 01:04:24 (10.9 MB/s) - ‘scalapack.tgz’ saved [5307441/5307441]

CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): Compatibility with CMake < 2.8.12 will be removed from a future version of CMake.

Update the VERSION argument value or use a ... suffix to tell CMake that the project does not need compatibility with older versions.

-- The C compiler identification is Intel 19.0.5.20190815 -- The Fortran compiler identification is Intel 19.0.5.20190815 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort - skipped -- Checking whether /opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort supports Fortran 90 -- Checking whether /opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/ifort supports Fortran 90 - yes -- Found MPI_C: /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/lib/libmpi.so (found version "3.1") -- Found MPI_Fortran: /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/lib/libmpifort.so (found version "3.1") -- Found MPI: TRUE (found version "3.1") -- Found MPI_LIBRARY : TRUE -- --> MPI C Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpicc -- --> C Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpicc -- --> MPI Fortran Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpif90 -- --> Fortran Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpif90 -- Reducing RELEASE optimization level to O2 -- ========= -- Compiling and Building BLACS INSTALL Testing to set correct variables -- Configure in the INSTALL directory successful -- Build in the BLACS INSTALL directory successful -- ========= -- Testing FORTRANMANGLING -- CDEFS set to Add -- ========= -- CHECKING BLAS AND LAPACK LIBRARIES -- --> Searching for optimized LAPACK and BLAS libraries on your machine. -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Looking for pthread.h -- Looking for pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found BLAS: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl;-lpthread;-lm;-ldl -- BLAS library: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl -- LAPACK library: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl;-lpthread;-lm;-ldl -- ========= -- Configuring done -- Generating done -- Build files have been written to: /home/dgrin/ctf/scalapack/build

followed by many lines showing scalapack being built with no errors

followed by the output Make Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required): Compatibility with CMake < 2.8.12 will be removed from a future version of CMake.

Update the VERSION argument value or use a ... suffix to tell CMake that the project does not need compatibility with older versions.

-- Found MPI_LIBRARY : TRUE -- --> MPI C Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpicc -- --> C Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpicc -- --> MPI Fortran Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpif90 -- --> Fortran Compiler : /opt/apps/mpi/mpich-3.3.2_intel-19.0.5.281/bin/mpif90 -- ========= -- Compiling and Building BLACS INSTALL Testing to set correct variables -- Configure in the INSTALL directory successful -- Build in the BLACS INSTALL directory successful -- ========= -- Testing FORTRANMANGLING -- CDEFS set to Add -- ========= -- CHECKING BLAS AND LAPACK LIBRARIES -- --> Searching for optimized LAPACK and BLAS libraries on your machine. -- BLAS library: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl -- LAPACK library: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;/opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so;-lpthread;-lm;-ldl;-lpthread;-lm;-ldl -- ========= -- Configuring done -- Generating done -- Build files have been written to: /home/dgrin/ctf/scalapack/build Scanning dependencies of target scalapack Consolidate compiler generated dependencies of target scalapack

followed by many more lines with cmake building scalapack

followed by

build completed in scalapack subdirectory. Checking for static ScaLAPACK... SCALAPACK library found (-lscalapack). Checking for dynamic ScaLAPACK... SCALAPACK library found ($BDYNAMIC -lscalapack). Checking for HPTT (optimized transposition library)... HPTT does not work, will use built-in transpose kernel. Checking whether to use CUDA... CUDA will not be used. A config.mk file has been created (to adjust all settings edit the config.mk file manually or rerun ./configure to create a new one). A setup.py file has been created (to adjust all settings edit the setup.py file manually or rerun ./configure to create a new one). Configure finished successfully (see how-did-i-configure to determine how script was executed).

and successful make and make install commands

After that I tried make python, which generated many of these warnings, ipo: warning #11012: unable to find -l-mkl mpicxx -pthread -shared -Wl,-z,relro -Wl,-z,now -g -Wl,-z,relro -Wl,-z,now -g -L/home/dgrin/ctf/lib_shared /home/dgrin/ctf/lib_python/ctf/term.o -L/usr/lib64 -lctf -lscalapack -l-mkl -lpython3.6m -o /home/dgrin/ctf/lib_python/ctf/term.cpython-36m-x86_64-linux-gnu.so -L/home/dgrin/ctf/lib_shared -qopenmp -O3 -ipo -L/home/dgrin/ctf/scalapack/build/lib -Wl,-rpath=/home/dgrin/ctf/scalapack/build/lib ipo: warning #11012: unable to find -l-mkl ld: cannot find -l-mkl ld: cannot find -l-mkl ld: cannot find -l-mkl ld: cannot find -l-mkl error: command 'mpicxx' failed with exit status 1

ending the install

Wondering if the issue was just with the python installation, I also tried make test, which yielded the message

make test_suite -C test make[1]: Entering directory '/home/dgrin/ctf/test' mpicxx -x c++ -qopenmp -O3 -ipo -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1 -DUSE_BATCH_GEMM -DUSE_LAPACK -DUSE_MKL -DUSE_SCALAPACK -c ../examples/btwn_central_kernels.cxx -o /home/dgrin/ctf/obj/btwn_central_kernels.o -I../include/ mpicxx -qopenmp -O3 -ipo -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1 -DUSE_BATCH_GEMM -DUSE_LAPACK -DUSE_MKL -DUSE_SCALAPACK test_suite.cxx /home/dgrin/ctf/obj/btwn_central_kernels.o -o /home/dgrin/ctf/bin/test_suite -I../include/ -L/home/dgrin/ctf/lib -lctf -L/home/dgrin/ctf/scalapack/build/lib -Wl,-Bstatic -lscalapack -Wl,-Bdynamic -mkl ld: /home/dgrin/ctf/scalapack/build/lib/libscalapack.a(pxerbla.f.o): undefined reference to symbol 'for_write_seq_fmt' /opt/intel/compilers_and_libraries/linux/lib/intel64//libifcoremt.so.5: error adding symbols: DSO missing from command line make[1]: [Makefile:13: /home/dgrin/ctf/bin/test_suite] Error 1 make[1]: Leaving directory '/home/dgrin/ctf/test' make: [Makefile:74: test_suite] Error 2

dgrin1 commented 1 year ago

./configure CXX=mpicxx --with-lapack 'LD_LIB_PATH=-L/opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/lib/intel64 -Wl,--no-as-needed' 'LD_LIBS=-lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_lp64 -lmkl_def -liomp5 -lpthread -lm -ldl' --install-dir=~/lib/ Checking compiler type/version... Using Intel compilers. Checking whether APPLE is defined... no. Checking compiler (CXX)... successful. Checking flags (CXXFLAGS)... successful. Checking availability of C++11... successful. Checking for MPI... MPI works. Checking for OpenMP... OpenMP works. Checking for static BLAS library... detected that -mkl works, speculatively using -mkl. Checking for availability of static batched gemm... available, will build with -DUSE_BATCH_GEMM. Checking for dynamic BLAS library... dynamic BLAS library found, with underscores. Checking for availability of dynamic batched gemm... Checking for static LAPACK library... static LAPACK found. Checking for dynamic LAPACK library... dynamic LAPACK found. Checking for sparse MKL routines... sparse MKL found. Checking for static ScaLAPACK... static ScaLAPACK not found, some functionality and tests will be unavailable, to fix reconfigure and add --with-scalapack and the appropriate library path (LIB_PATH/LIBS for static, LD_LIB_PATH/LD_LIBS for dynamic) or configure with --build-scalapack to attempt to automatically download and build scalapack Checking for dynamic ScaLAPACK... dynamic SCALAPACK found. Checking for HPTT (optimized transposition library)... HPTT does not work, will use built-in transpose kernel. Checking whether to use CUDA... CUDA will not be used. A config.mk file has been created (to adjust all settings edit the config.mk file manually or rerun ./configure to create a new one). A setup.py file has been created (to adjust all settings edit the setup.py file manually or rerun ./configure to create a new one). Configure finished successfully (see how-did-i-configure to determine how script was executed).

make and make install then ran without issue, as does make python

but then when I run make python_test2, this error messages comes out

complex, int, double, int, int, int, int, int, double, int*)': ipo_out.c:(.text._ZN13CTF_SCALAPACK6pheevxIdEEvccciPSt7complexIT_EiiPiS2_S2_iiS2_S5_S5_PS2_S2_S4_iiS5_S4_iS6_iS5_iS5_S5_S6S5[_ZN13CTF_SCALAPACK6pheevxIdEEvccciPSt7complexIT_EiiPiS2_S2_iiS2_S5_S5_PS2_S2_S4_iiS5_S4_iS6_iS5_iS5_S5_S6S5]+0x149): undefined reference to `pzheevx_'