UCL-RITS / rcps-buildscripts

Scripts to automate package builds on RC Platforms
MIT License
39 stars 27 forks source link

Install Request: CP2K v8.1 on Young - IN:04611351 #417

Closed cschran closed 3 years ago

cschran commented 3 years ago

The latest release of CP2K has implemented various bug fixes and new features. Some of them are strictly needed for my work and also others in the Michaelides group.

Installation of the new version on young would be highly appreciated.

gubits commented 3 years ago

CP2K 8.1 is needed for our group also. I have tried to compile it by myself several times. Unfortunately, there are some problems related to compiler, which I can not solve till now! Waiting for a version installed by the YOUNG TEAM. Thanks!

heatherkellyucl commented 3 years ago

I'm starting an install of CP2K 8.2 as that's the current version.

Ran into a related screen issue that @owainkenwayucl had last time:

./install_cp2k_toolchain.sh -j 8 --with-sirius=no --with-spfft=no --with-quip=install --with-openblas=system
MPI is detected and it appears to be OpenMPI
nvcc not found, disabling CUDA by default
Compiling with 8 processes.
/home/cceahke/cp2k-8.2/tools/toolchain/install/toolchain.env: line 75: :DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\: command not found

If you unset TERMCAP in that screen session then it begins normally. https://github.com/cp2k/cp2k/issues/709 https://groups.google.com/g/cp2k/c/4iwvz2cuh80/m/X0fEdPdxAgAJ

heatherkellyucl commented 3 years ago

Currently failing on libint:

==================== Installing LIBINT ====================
libint-v2.6.0-cp2k-lmax-5.tgz is found
Installing from scratch into /home/cceahke/cp2k-8.2/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5
ERROR: (./scripts/stage3/install_libint.sh, line 90) Non-zero exit code detected.

Failure is in the install step: /home/cceahke/cp2k-8.2/tools/toolchain/build/libint-v2.6.0-cp2k-lmax-5/install.log

/usr/bin/install -c -m 0644 /home/cceahke/cp2k-8.2/tools/toolchain/build/libint-v2.6.0-cp2k-lmax-5/./lib/basis/* /home/cceahke/cp2k-8.2/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/share/libint/2.6.0/basis
(cd fortran && make) || exit 1
make[1]: Entering directory `/lustre/home/cceahke/cp2k-8.2/tools/toolchain/build/libint-v2.6.0-cp2k-lmax-5/fortran'
g++ -O2 -fno-omit-frame-pointer -g -march=native -mtune=native -g1 -E -DHAVE_CONFIG_H -D__COMPILING_LIBINT2=1 -D__COMPILING_LIBINT2=1 -I../include -I..//include  -O2 -fno-omit-frame-pointer -g -march=native -mtune=native -O2 -fno-omit-frame-pointer -g -march=native -mtune=native -g1  ../include/libint2.h > ../include/libint2.h.i
python c_to_f.py ../include/libint2.h.i libint2_types_f.h Libint_t
grep '^#' ../include/libint2_types.h | grep -v '#include' > fortran_incldefs.h
FC libint_f.o
../include/libint2/util/generated/libint2_params.h:29:0:

   29 | #    if __has_include(<libint2_params.h>)
      | 
Error: missing '(' before "__has_include" operand
../include/libint2/util/generated/libint2_params.h:29:0: Error: operator "__has_include" requires a header-name
make[1]: *** [libint_f.o] Error 1
make[1]: Leaving directory `/lustre/home/cceahke/cp2k-8.2/tools/toolchain/build/libint-v2.6.0-cp2k-lmax-5/fortran'
make: *** [fortran] Error 1

Eeg, this is a gcc-10.1 and 10.2 regression and is fixed in 10.3. Occurs when using -traditional-cpp and breaks building some Fortran packages.
https://github.com/evaleev/libint/issues/173 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95889

(EasyBuild's gcc 10.2 is patched for this).

We could reinstall a gcc 10.2 that was patched, then update the gcc-libs/10.2.0 module to point at the fixed version instead.

Or I can build CP2K with gcc 9.2 instead, since we already have an openblas and openmpi for that. But gcc 10.2 will probably need something doing about it at some point since we were heading to it as our new default...

heatherkellyucl commented 3 years ago

Patched gcc 10.2 build going on Myriad.

heatherkellyucl commented 3 years ago

CP2K successfully builds libint now, and goes on to fail on scalapack:

==================== Installing ScaLAPACK ====================
scalapack-2.1.0.tgz: OK
Checksum of scalapack-2.1.0.tgz Ok
Installing from scratch into /home/cceahke/cp2k-8.2/tools/toolchain/install/scalapack-2.1.0
ERROR: (./scripts/stage4/install_scalapack.sh, line 61) Non-zero exit code detected.
CMake Error at BLACS/TESTING/CMakeLists.txt:4 (if):
  if given arguments:

    "GNU" "STREQUAL" "GNU" "AND" "CMAKE_Fortran_COMPILER_VERSION" "VERSION_GREATER_EQUAL" "10"

  Unknown arguments specified

-- Configuring incomplete, errors occurred!
heatherkellyucl commented 3 years ago

Ah, I should probably just let it build its own cmake with the version it likes. (I'd told it to use the system one but at some point not loaded the newest cmake module so it was using an ancient actually-system one).

heatherkellyucl commented 3 years ago
==================== Installing QUIP ====================
QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz: OK
Checksum of QUIP-b4336484fb65b0e73211a8f920ae4361c7c353fd.tar.gz Ok
Installing from scratch into /home/cceahke/cp2k-8.2/tools/toolchain/install/quip-b4336484fb65b0e73211a8f920ae4361c7c353fd
/usr/bin/env: python3: No such file or directory
ERROR: (./scripts/stage6/install_quip.sh, line 107) Non-zero exit code detected.

Quip needs a python3 now, apparently. And all our python3s depend on a conflicting version of openblas (openblas/0.3.7-serial/gnu-4.9.2)...

It is the 'links specific version of libgfortran.so' shared library problem.

heatherkellyucl commented 3 years ago

Well, to be more specific, all our pythons that have any packages installed for them require that. 'Just python' python/3.8.6 doesn't.

heatherkellyucl commented 3 years ago

Ok, that works - toolchain part is complete.

heatherkellyucl commented 3 years ago

I left it trying the main install over the weekend and have come back to

cd /lustre/home/cceahke/cp2k-8.2/exe/local; ln -sf cp2k.psmp cp2k_shell.psmp
cd /lustre/home/cceahke/cp2k-8.2/exe/local; ln -sf cp2k.psmp cp2k.popt
make: *** [all] Error 2

So I'll have to do some digging to work out where the failure was.

heatherkellyucl commented 3 years ago

The problem is with the ssmp install:

gfortran -c -march=native -mtune=native -fno-omit-frame-pointer -g -O3 -funroll-loops  -fopenmp   -I'/lustre/shared/ucl/apps/openblas/0.3.13-openmp/gnu-10.2.0/include' -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/fftw-3.3.8/include' -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include' -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/libxc-5.1.4/include' -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/libxsmm-1.16.1/include' -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/COSMA-2.5.0/include'  -I'/home/cceahke/cp2k-8.2/tools/toolchain/install/quip-b4336484fb65b0e73211a8f920ae4361c7c353fd/include' -I/home/cceahke/cp2k-8.2/tools/toolchain/install/spglib-1.16.0/include -fbacktrace -ffree-form -fimplicit-none -std=f2008  -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized -Wuse-without-only  -D__LIBXSMM   -D__FFTW3  -D__LIBINT -D__LIBXC    -D__QUIP -D__SPGLIB -D__LIBVORI   -D__COMPILE_ARCH="\"local\"" -D__COMPILE_DATE="\"Mon  5 Jul 16:03:18 BST 2021\"" -D__COMPILE_HOST="\"login13.myriad.ucl.ac.uk\"" -D__COMPILE_REVISION="\"git:310b7ab\"" -D__DATA_DIR="\"/lustre/home/cceahke/cp2k-8.2/data\"" -D__SHORT_FILE__="\"pw/pw_grids.F\"" -I'/lustre/home/cceahke/cp2k-8.2/src/pw/' -I'/lustre/home/cceahke/cp2k-8.2/obj/local/ssmp/exts/dbcsr' pw_grids.F90
/home/cceahke/cp2k-8.2/tools/toolchain/install/fftw-3.3.8/lib/libfftw3_mpi.so: undefined reference to `ompi_mpi_op_sum'
/home/cceahke/cp2k-8.2/tools/toolchain/install/fftw-3.3.8/lib/libfftw3_mpi.so: undefined reference to `ompi_mpi_char'
/home/cceahke/cp2k-8.2/tools/toolchain/install/fftw-3.3.8/lib/libfftw3_mpi.so: undefined reference to `MPI_Bcast'
...
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status

It is still trying to link against its libfftw3_mpi.so for the non-MPI build.

heatherkellyucl commented 3 years ago

arch/local.ssmp sets

LIBS        = -lsymspg -lquip_core -latoms -lFoX_sax -lFoX_common -lFoX_utils -lFoX_fsys    -lxsmmf -lxsmm -ldl -lpthread -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp    -lopenblas -lvori -lstdc++ -lstdc++

The important part being -lfftw3_mpi -lfftw3 -lfftw3_omp

This was not a problem in 7.1...

heatherkellyucl commented 3 years ago

[Continuing the test build on Thomas since Myriad's having issues]

heatherkellyucl commented 3 years ago

The ssmp version builds successfully when you patch the arch file.

heatherkellyucl commented 3 years ago

Installed on:

heatherkellyucl commented 3 years ago

To use:

module unload -f compilers mpi gcc-libs
module load beta-modules
module load gcc-libs/10.2.0
module load compilers/gnu/10.2.0
# if Myriad
module load numactl/2.0.12
module load binutils/2.36.1/gnu-10.2.0
module load ucx/1.9.0/gnu-10.2.0
# end if Myriad
module load mpi/openmpi/4.0.5/gnu-10.2.0
module load openblas/0.3.13-openmp/gnu-10.2.0
module load cp2k/8.2/ompi/gnu-10.2.0
heatherkellyucl commented 3 years ago

@cschran @gubitgubit CP2K 8.2 is ready for you to test on Young.

gubits commented 3 years ago

It is great! It runs very well till now. Thank you very much!

Bin

On 7/12/21 5:26 PM, heatherkellyucl wrote:

@cschran https://github.com/cschran @gubitgubit https://github.com/gubitgubit CP2K 8.2 is ready for you to test on Young.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UCL-RITS/rcps-buildscripts/issues/417#issuecomment-878419204, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJRMGGNFBKAJGFYYEOBUILTXMJUPANCNFSM4YEBANFQ.