Open ocaisa opened 2 years ago
You are certainly giving MPItrampoline a good workout. Thank you for your patience.
I tried again this morning with the updated PR but it is still failing. I realised my snippet above is not really showing all the issues that the compilation was running into so I've uploaded a complete gist of the build. The Makefile
used with the build is shown at L324. The compiler errors begin at L430.
Sorry for barraging you over the last week, but I would really like to do some performance verification checks for MPItrampoline with real applications (and ultimately support it as part of a toolchain in EasyBuild).
I'm now using Spack (sorry!) to build CP2K against MPItrampoline. I can reproduce your errors. These errors are reported because the MPI standard technically violates the Fortran standard, and newer GNU Fortran compilers report these errors. There are, of course, command line flags or function attributes that one can use to circumvent these errors. I am now looking into ways to automate this, so that people using MPItrampoline don't have to manually specify these flags.
Here is a patch to make CP2K build with MPItrampoline. In some cases CP2K violates the MPI standard (in a way that is harmless for other MPI implementations), other changes are only necessary for MPItrampoline's Fortran interface.
I have also released a new version of MPItrampoline that has some missing features added.
Confirmed that worked for me. I had to make a tiny change to the patch for version 8.2. I also had to enable -fallow-argument-mismatch
to get a successful compilation, otherwise I ran into errors like:
/project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exts/dbcsr/src/mpi/dbcsr_mpiwrap.F:5413:47:
5413 | CALL MPI_FILE_READ_AT_ALL(fh, offset, msg, msg_len, ${mpi_type1}$, MPI_STATUS_IGNORE, ierr)
| 1
......
5435 | CALL MPI_FILE_READ_AT_ALL(fh, offset, msg, 1, ${mpi_type1}$, MPI_STATUS_IGNORE, ierr)
| 2
Error: Rank mismatch between actual argument at (1) and actual argument at (2) (scalar and rank-1)
/project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exts/dbcsr/src/mpi/dbcsr_mpiwrap.F:5358:48:
5358 | CALL MPI_FILE_WRITE_AT_ALL(fh, offset, msg, msg_len, ${mpi_type1}$, MPI_STATUS_IGNORE, ierr)
| 1
......
5380 | CALL MPI_FILE_WRITE_AT_ALL(fh, offset, msg, 1, ${mpi_type1}$, MPI_STATUS_IGNORE, ierr)
| 2
Error: Rank mismatch between actual argument at (1) and actual argument at (2) (scalar and rank-1)
(this might already be fixed with the most recent release 9.1)
Yes, I forgot about -fallow-argument-mismatch
. The Spack recipe for CP2K already uses this flag for MPICH, so I added it there for MPItrampoline as well. It would be convenient of CP2K did this automatically.
I can add it automatically for EasyBuild, it is already triggered in some scenarios (GCC 10+, CP2K < 7.1).
Unfortunately the test suite is segfaulting on every execution, an example:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/TEST-Linux-x86-64-gmtfbf-psmp-2022-03-02_14-46-04/UNIT/libcp2k_unittest.out
** ## ## **
** **
** ... make the atoms dance **
** **
** Copyright (C) by CP2K developers group (2000-2021) **
** J. Chem. Phys. 152, 194103 (2020) **
** **
*******************************************************************************
*******************************************************************************
* ___ *
* / \ *
* [ABORT] *
* \___/ CPASSERT failed *
* | *
* O/| *
* /| | *
* / \ pw/pw_grids.F:1601 *
*******************************************************************************
===== Routine Calling Stack =====
8 pw_grid_assign
7 pw_grid_setup_internal
6 pw_grid_setup
5 pw_env_rebuild
4 qs_env_rebuild_pw_env
3 qs_env_setup
2 qs_init_subsys
1 CP2K
[xlnode1:2056906:0:2056906] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
EXIT CODE: 1 MEANING: RUNTIME FAIL
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/TEST-Linux-x86-64-gmtfbf-psmp-2022-03-02_14-46-04/QS/regtest-grid/simple_non-ortho_grid_auto.inp.out
Description: Goedecker-Teter-Hutter pseudopotential
Goedecker et al., PRB 54, 1703 (1996)
Hartwigsen et al., PRB 58, 3641 (1998)
Krack, TCA 114, 145 (2005)
Gaussian exponent of the core charge distribution: 4.109048
Electronic configuration (s p d ...): 2 2
The test suite is somewhat notorious but I'd expect we should be able to match the results given with OpenMPI.
I will have a look. How do you run the test suite?
TBH I'm not sure, it runs automatically with EB. Here's the steps:
== 2022-03-02 14:46:04,950 build_log.py:265 INFO testing...
== 2022-03-02 14:46:04,952 easyblock.py:3711 INFO Starting test step
== 2022-03-02 14:46:04,952 easyconfig.py:1686 INFO Generating template values...
== 2022-03-02 14:46:04,952 mpi.py:120 INFO Using template MPI command 'mpiexec -n %(nr_ranks)s %(cmd)s' for MPI family 'MPItrampoline'
== 2022-03-02 14:46:04,952 mpi.py:305 INFO Using MPI command template 'mpiexec -n %(nr_ranks)s %(cmd)s' (params: {'nr_ranks': 1, 'cmd': 'xxx_command_xxx'})
== 2022-03-02 14:46:04,953 easyconfig.py:1705 INFO Template values: arch='x86_64', bitbucket_account='cp2k', builddir='/project/def-sponsor00/easybuild/build/CP2K/8.2/gmtfbf-2021a', github_account='cp2k', installdir='/project/def-sponsor00/easybuild/software/CP2K/8.2-gmtfbf-2021a', module_name='CP2K/8.2-gmtfbf-2021a', mpi_cmd_prefix='mpiexec -n 1', name='CP2K', nameletter='C', nameletterlower='c', namelower='cp2k', parallel='16', toolchain_name='gmtfbf', toolchain_version='2021a', version='8.2', version_major='8', version_major_minor='8.2', version_minor='2', versionprefix='', versionsuffix=''
== 2022-03-02 14:46:04,953 easyblock.py:3719 INFO Running method test_step part of step test
== 2022-03-02 14:46:04,953 environment.py:91 INFO Environment variable CP2K_DATA_DIR set to /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/data (previously undefined)
== 2022-03-02 14:46:04,955 cp2k.py:746 INFO No reference output found for regression test, just continuing without it...
== 2022-03-02 14:46:04,960 cp2k.py:753 INFO Using 4 cores for the MPI tests
== 2022-03-02 14:46:04,960 mpi.py:120 INFO Using template MPI command 'mpiexec -n %(nr_ranks)s %(cmd)s' for MPI family 'MPItrampoline'
== 2022-03-02 14:46:04,960 mpi.py:305 INFO Using MPI command template 'mpiexec -n %(nr_ranks)s %(cmd)s' (params: {'nr_ranks': 4, 'cmd': ''})
== 2022-03-02 14:46:04,963 run.py:233 INFO running cmd: /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/tools/regtesting/do_regtest -nobuild -config cp2k_regtest.cfg
and cp2k_regtest.cfg
contains
[ocaisa@xlnode1 gmtfbf-2021a]$ cat cp2k_regtest.cfg
FORT_C_NAME="gfortran"
dir_base=/project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a
cp2k_version=psmp
dir_triplet=Linux-x86-64-gmtfbf
export ARCH=${dir_triplet}
cp2k_dir=cp2k-8.2
leakcheck="YES"
maxtasks=4
Not all of the tests are failing straight away, I see some are running for quite some time. I'll need to do a comparision build with OpenMPI to fully check. Unfortunately the test suite takes ages, so I won't be able to report back for quite a while.
The regressions tests with OpenMPI took 3 hours (and had 10 failures). The regression tests with MPItrampoline are still running (over 5 hours) and look to have more than 1000 failures.
Here's a backtrace on one of the errors:
==== backtrace (tid:3398606) ====
0 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/UCX/1.10.0-GCCcore-10.3.0/lib64/libucs.so.0(ucs_handle_error+0x254) [0x14f2a7b5f474]
1 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/UCX/1.10.0-GCCcore-10.3.0/lib64/libucs.so.0(+0x21657) [0x14f2a7b5f657]
2 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/software/linux/x86_64/amd/zen2/software/UCX/1.10.0-GCCcore-10.3.0/lib64/libucs.so.0(+0x2180a) [0x14f2a7b5f80a]
3 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/compat/linux/x86_64/lib/../lib64/libpthread.so.0(+0x120f0) [0x14f2c322d0f0]
4 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/compat/linux/x86_64/lib/../lib64/libc.so.6(+0x157c77) [0x14f2c1530c77]
5 /project/def-sponsor00/easybuild/software/MPItrampoline/3.8.0-GCC-10.3.0/mpiwrapper/lib/libopen-pal.so.40(+0x41798) [0x14f2adecb798]
6 /project/def-sponsor00/easybuild/software/MPItrampoline/3.8.0-GCC-10.3.0/mpiwrapper/lib/libmpi.so.40(ompi_coll_base_allreduce_intra_redscat_allgather+0x1a9) [0x14f2ae09cd49]
7 /project/def-sponsor00/easybuild/software/MPItrampoline/3.8.0-GCC-10.3.0/mpiwrapper/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allreduce_intra_dec_fixed+0x4a) [0x14f2a799062a]
8 /project/def-sponsor00/easybuild/software/MPItrampoline/3.8.0-GCC-10.3.0/mpiwrapper/lib/libmpi.so.40(PMPI_Allreduce+0xf0) [0x14f2ae054090]
9 /project/def-sponsor00/easybuild/software/MPItrampoline/3.8.0-GCC-10.3.0/mpiwrapper/lib/libmpi_mpifh.so.40(mpi_allreduce_+0x79) [0x14f2ae1715b9]
10 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x1d7d70a]
11 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x1b0c09c]
12 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x9fbbe6]
13 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0xa7091b]
14 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0xa733ad]
15 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0xa693e1]
16 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x831661]
17 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x47524e]
18 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x478df1]
19 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x474139]
20 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x40edaf]
21 /cvmfs/pilot.eessi-hpc.org/versions/2021.12/compat/linux/x86_64/lib/../lib64/libc.so.6(__libc_start_main+0xce) [0x14f2c13fc7fe]
22 /project/60005/easybuild/build/CP2K/8.2/gmtfbf-2021a/cp2k-8.2/exe/Linux-x86-64-gmtfbf/cp2k.psmp() [0x4725ca]
I might bump everything to later compilers and OpenMPI next week to see if the problems still exist with out latest toolchains.
I would expect that the regression test failures are either errors in the MPItrampoline Fortran bindings, or errors in my changes to CP2K. So far, I have built the OpenMPI CP2K tests in a Docker container; my next steps would be to convert these to MPItrampoline tests so that I can run these tests locally.
If you have a reproducible setup that I can use, then that could save me some time.
What I have is reproducible but tedious for you I suspect (it would involved building everything down to the compiler and requires customisations for MPItrampoline that are currently not merged in an EasyBuild release yet).
My job with the MPItrampoline tests was killed after 13 hours of testing :(
You can find the docs on the tests at https://www.cp2k.org/dev:regtesting . If you have an existing build you should be able to run the tests using that build (after getting the sources and then starting from Step 2 using the -nobuild
option)
There is a section there also about the directory structure you need.
I tried a -nobuild
test, following your instructions. This leads to the error
make: *** No rule to make target 'realclean'. Stop.
Apparently there is a makefile that needs to be somewhere.
I also tried the Docker container I mentioned earlier (with OpenMPI, no changes, straight from the checkout). This led to many failures (55 out of 60).
Building everything locally wouldn't be a problem, e.g. the Docker containers started by building GCC. But I am looking for instructions (someone "holding my hand") to reproduce the issue. Either a Dockerfile or a shell script for macOS or Linux would work.
Assuming that you have the sources and a build of CP2K already, here were my steps
mkdir CP2K_testing
tar -jxvf cp2k-8.2.tar.bz2
cd cp2k-8.2
mv data/ ../CP2K_testing/
mv tests/ ../CP2K_testing/
mv tools ../CP2K_testing/
cd ../CP2K_testing/
mkdir -p exe
module load CP2K/8.2-gmtfbf-2021a
cd exe/
ln -s prebuilt $EBROOTCP2K/bin # Path to installation directory
I could then run the tests on the cluster with
#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=2
#SBATCH --ntasks-per-core=1
# More SBATCH options:
# If you need 512GB memory nodes (otherwise only 256GB guaranteed):
# #SBATCH --mem=497G
# To run on the debug queue (max 10 nodes, 30 min):
# #SBATCH--partition=debug
set -o errexit
set -o nounset
set -o pipefail
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_PROC_BIND=close
export OMP_PLACES=cores
module load CP2K/8.2-gmtfbf-2021a
# Let the user see the currently loaded modules in the slurm log for completeness:
module list
CP2K_BASE_DIR="/home/ocaisa/CP2K_testing"
CP2K_TEST_DIR="/scratch/ocaisa/cp2k_regtesting"
CP2K_REGTEST_SCRIPT_DIR="/home/ocaisa/CP2K_testing/tools/regtesting"
CP2K_ARCH="prebuilt"
CP2K_VERSION="psmp"
NTASKS_SINGLE_TEST=2
NNODES_SINGLE_TEST=1
SRUN_CMD="mpiexec"
# to run tests across nodes (to check for communication effects), use:
# NNODES_SINGLE_TEST=4
# SRUN_CMD="srun --cpu-bind=verbose,cores --ntasks-per-node 2"
# the following should be sufficiently generic:
mkdir -p "${CP2K_TEST_DIR}"
cd "${CP2K_TEST_DIR}"
cp2k_rel_dir=$(realpath --relative-to="${CP2K_TEST_DIR}" "${CP2K_BASE_DIR}")
# srun does not like `-np`, override the complete command instead:
export cp2k_run_prefix="${SRUN_CMD} -N ${NNODES_SINGLE_TEST} -n ${NTASKS_SINGLE_TEST}"
"${CP2K_REGEST_SCRIPT_DIR:-${CP2K_BASE_DIR}/tools/regtesting}/do_regtest" \
-arch "${CP2K_ARCH}" \
-version "${CP2K_VERSION}" \
-nobuild \
-mpiranks ${NTASKS_SINGLE_TEST} \
-ompthreads ${OMP_NUM_THREADS} \
-maxtasks ${SLURM_NTASKS} \
-cp2kdir "${cp2k_rel_dir}" \
|& tee "${CP2K_TEST_DIR}/${CP2K_ARCH}.${CP2K_VERSION}.log"
Ok, I think this may be simpler than needing to run the full test suite. I downloaded a H.inp
sample input file from the CP2K repo. This takes a second to run with OpenMPI:
OMP_NUM_THREADS=2 mpiexec --oversubscribe -n 8 cp2k.psmp ./H.inp
but with the MPItrampoline version it hangs when gathering statistics at the end:
**** **** ****** ** PROGRAM STARTED AT 2022-03-04 12:47:00.962
***** ** *** *** ** PROGRAM STARTED ON xlnode1.int.eessi-gpu.learnhpc
** **** ****** PROGRAM STARTED BY ocaisa
***** ** ** ** ** PROGRAM PROCESS ID 532650
**** ** ******* ** PROGRAM STARTED IN /home/ocaisa
CP2K| version string: CP2K version 8.2
CP2K| source code revision number: git:310b7ab
CP2K| cp2kflags: omp libint fftw3 libxc parallel mpi3 scalapack xsmm plumed2
CP2K| is freely available from https://www.cp2k.org/
CP2K| Program compiled at Thu Mar 3 11:47:21 AM UTC 2022
CP2K| Program compiled on xlnode1.int.eessi-gpu.learnhpc.eu
CP2K| Program compiled for Linux-x86-64-gmtfbf
CP2K| Data directory path /project/def-sponsor00/easybuild/software/CP2K/8.2
CP2K| Input file name ./H.inp
GLOBAL| Method name ATOM
GLOBAL| Project name H
GLOBAL| Run type ENERGY_FORCE
GLOBAL| FFT library FFTW3
GLOBAL| Diagonalization library ScaLAPACK
GLOBAL| Orthonormality check for eigenvectors DISABLED
GLOBAL| Matrix multiplication library SCALAP
GLOBAL| All-to-all communication in single precision F
GLOBAL| FFTs using library dependent lengths F
GLOBAL| Grid backend AUTO
GLOBAL| Global print level MEDIUM
GLOBAL| MPI I/O enabled T
GLOBAL| Total number of message passing processes 8
GLOBAL| Number of threads for this process 2
GLOBAL| This output is from process 0
GLOBAL| CPU model name AMD EPYC 7742 64-Core Processor
GLOBAL| CPUID 1002
MEMORY| system memory details [Kb]
MEMORY| rank 0 min max average
MEMORY| MemTotal 65951236 0 0 0
MEMORY| MemFree 56722588 0 0 0
MEMORY| Buffers 2104 0 0 0
MEMORY| Cached 7199448 0 0 0
MEMORY| Slab 537888 0 0 0
MEMORY| SReclaimable 300676 0 0 0
MEMORY| MemLikelyFree 64224816 0 0 0
*** Fundamental physical constants (SI units) ***
*** Literature: B. J. Mohr and B. N. Taylor,
*** CODATA recommended values of the fundamental physical
*** constants: 2006, Web Version 5.1
*** http://physics.nist.gov/constants
Speed of light in vacuum [m/s] 2.99792458000000E+08
Magnetic constant or permeability of vacuum [N/A**2] 1.25663706143592E-06
Electric constant or permittivity of vacuum [F/m] 8.85418781762039E-12
Planck constant (h) [J*s] 6.62606896000000E-34
Planck constant (h-bar) [J*s] 1.05457162825177E-34
Elementary charge [C] 1.60217648700000E-19
Electron mass [kg] 9.10938215000000E-31
Electron g factor [ ] -2.00231930436220E+00
Proton mass [kg] 1.67262163700000E-27
Fine-structure constant 7.29735253760000E-03
Rydberg constant [1/m] 1.09737315685270E+07
Avogadro constant [1/mol] 6.02214179000000E+23
Boltzmann constant [J/K] 1.38065040000000E-23
Atomic mass unit [kg] 1.66053878200000E-27
Bohr radius [m] 5.29177208590000E-11
*** Conversion factors ***
[u] -> [a.u.] 1.82288848426455E+03
[Angstrom] -> [Bohr] = [a.u.] 1.88972613288564E+00
[a.u.] = [Bohr] -> [Angstrom] 5.29177208590000E-01
[a.u.] -> [s] 2.41888432650478E-17
[a.u.] -> [fs] 2.41888432650478E-02
[a.u.] -> [J] 4.35974393937059E-18
[a.u.] -> [N] 8.23872205491840E-08
[a.u.] -> [K] 3.15774647902944E+05
[a.u.] -> [kJ/mol] 2.62549961709828E+03
[a.u.] -> [kcal/mol] 6.27509468713739E+02
[a.u.] -> [Pa] 2.94210107994716E+13
[a.u.] -> [bar] 2.94210107994716E+08
[a.u.] -> [atm] 2.90362800883016E+08
[a.u.] -> [eV] 2.72113838565563E+01
[a.u.] -> [Hz] 6.57968392072181E+15
[a.u.] -> [1/cm] (wave numbers) 2.19474631370540E+05
[a.u./Bohr**2] -> [1/cm] 5.14048714338585E+03
DBCSR| CPU Multiplication driver XSMM
DBCSR| Multrec recursion limit 512
DBCSR| Multiplication stack size 1000
DBCSR| Maximum elements for images UNLIMITED
DBCSR| Multiplicative factor virtual images 1
DBCSR| Use multiplication densification T
DBCSR| Multiplication size stacks 3
DBCSR| Use memory pool for CPU allocation F
DBCSR| Number of 3D layers SINGLE
DBCSR| Use MPI memory allocation F
DBCSR| Use RMA algorithm F
DBCSR| Use Communication thread T
DBCSR| Communication thread load 83
DBCSR| MPI: My node id 0
DBCSR| MPI: Number of nodes 8
DBCSR| OMP: Current number of threads 2
DBCSR| OMP: Max number of threads 2
DBCSR| Split modifier for TAS multiplication algorithm 1.0E+00
**** ****** **** ****
** ** ****** ** ** ******
****** ** ** ** ** **
** ** ** **** ** **
University of Zurich
2009 - 2015
Version 0.0
Atomic Energy Calculation Hydrogen [H] Atomic number: 1
METHOD | Restricted Kohn-Sham Calculation
METHOD | Nonrelativistic Calculation
FUNCTIONAL| ROUTINE=NEW
FUNCTIONAL| BECKE88:
FUNCTIONAL| A. Becke, Phys. Rev. A 38, 3098 (1988) {LDA version}
FUNCTIONAL| LYP:
FUNCTIONAL| C. Lee, W. Yang, R.G. Parr, Phys. Rev. B, 37, 785 (1988) {LDA versi
FUNCTIONAL| on}
Electronic structure
Total number of core electrons 0.00
Total number of valence electrons 1.00
Total number of electrons 1.00
Multiplicity not specified
S 1.00
*******************************************************************************
Iteration Convergence Energy [au]
*******************************************************************************
1 0.320749E-01 -0.456955069647
2 0.324918E-02 -0.457634427736
3 0.262900E-03 -0.457648540569
4 0.494227E-06 -0.457648648451
Energy components [Hartree] Total Energy :: -0.457648648451
Band Energy :: -0.222846251753
Kinetic Energy :: 0.482431430167
Potential Energy :: -0.940080078618
Virial (-V/T) :: 1.948629421373
Core Energy :: -0.496696135527
XC Energy :: -0.266586511371
Coulomb Energy :: 0.305633998447
Orbital energies State L Occupation Energy[a.u.] Energy[eV]
1 0 1.000 -0.222846 -6.063955
Total Electron Density at R=0: 0.288108
NORMAL TERMINATION OF
**** ****** **** ****
** ** ****** ** ** ******
****** ** ** ** ** **
** ** ** **** ** **
-------------------------------------------------------------------------------
- -
- DBCSR STATISTICS -
- -
-------------------------------------------------------------------------------
Given the backtrace above, I wonder if it is a specfic problem with MPI_Allreduce?
I also found an issue in the issue in the CP2K repo about the clash between the Fortran and MPI standards: https://github.com/cp2k/cp2k/issues/1019
I'll have a look.
What architecture are you using (x86_64?), and what MPI implementation (MPICH?)?
Yes x86_64
(AMD Rome), I'm using OpenMPI for the MPItrampoline default and comparing that to the same toolchain with vanilla Open MPI.
To leave a comment here, I did manage to get a patch that worked in a few cases but it was quite invasive and you would need quite a bit of knowledge (both programming language and use case) to get it right. The core problem is (it seems to me) that CP2K is using MPI constants to do variable initialisations and MPItrampoline can't allow that since it (and therefore the compiler) doesn't know what those constants should be until runtime. This looks like it might be a wider issue since I've seen the same type of problem appear for other (Fortran) applications.
@eschnett made the suggestion that perhaps (for Fortran at least) MPItrampoline should set it's own constants and then do runtime translation of those constants for the actual MPI runtime used.
@eschnett I was trying to consider a way to get around this. Much as I want to, for our case it is very hard to consider using MPItrampoline as part of a toolchain if there will be key Fortran applications that won't work. As a compromise, I was wondering if there would be a way to allow us to fix the MPI constant values when using the MPItrampoline compiler wrappers. There are only two key variants that I can think of, OpenMPI and MPICH, so perhaps an option to the compiler wrappers that allows us to use a specific set of values? That would allow me create two variants of problem applications like CP2K, one for an OpenMPI compatibility use case and one for MPICH compatibility use case. This would cover every scenario that I can currently think of (and would be extensible for ones I can't).
@ocaisa This would be a good compromise. Let me think about this.
I thought I would give this a full test with Fortran, and CP2K is a good benchmark for that. The build (v8.2) is failing with: