Closed barracuda156 closed 1 year ago
@alazzaro: for instance dbcsr_mpiwrap.F:5299
, could it be we need to use C_LOC(mp_baseptr)
? Currently it seems we rely on passing arguments by pointer, i.e., mp_baseptr
is passed by pointer (and hence it should work also).
@barracuda156: I wonder if the basepointer issue is "just a warning" or if the related code actually crashes...
@barracuda156: I wonder if the basepointer issue is "just a warning" or if the related code actually crashes...
@hfp There is a topic with test results: https://github.com/cp2k/dbcsr/issues/645 If there is something more specific to check, I could do that.
@alazzaro Could you please take a look?
@barracuda156 I'm sorry, it cannot be today, likely next week. MPICH 4.1 is still not supported in our tests, so it is not a surprise you see this error.
I am currently working on the MPI backend and try to upgrade DBCSR to the new mpi_f08 module. The warnings might be related to missing explicit interfaces on the library side which are to be expected here. I wonder about the errors in case of MPICH 4.1. The MPI standard (including MPI 3.0 and MPI 3.1) requires in case of the mpi module (not mpi_f08) to overload MPI_Alloc_mem with a TYPE(C_PTR) version if TYPE(C_PTR) is available compiler-wise. DBCSR's wrapper is in accordance to the example provided in the standard itself and thus should work in my opinion. I am still glancing through Google what it might be related to.
Maybe unrelated, sometimes it matters if TYPE(C_PTR)
is passed as TYPE(C_PTR), VALUE
because the opposite side expects the pointer address and not a pointer to the pointer...
I have observed it already in a different context but in the given case, even the MPI standard does not use the VALUE
attribute indicating that actually a pointer to the pointer is expected.
I have just checked the build-files of MPICH 4.0.3. In case of the mpi_f08 module, TYPE(C_PTR)
is used whereas for the mpi module, MPICH uses a placeholder which seems to be any possible type/kind/rank-combination (not standard-compliant).
My impression is that we should switch to the mpi_f08 module. I have code ready based on my currently opened PR. From CP2K, it could be that MPICH is not compatible with certain versions of gcc, but it worked with IntelMPI and OpenMPI (see cp2k/cp2k#2486).
@barracuda156 I've just merged https://github.com/cp2k/dbcsr/pull/678 so now the problem of this issue should go away if you build DCBSR with -DUSE_MPI_F08=ON
. Can you confirm that?
@barracuda156 I've just merged #678 so now the problem of this issue should go away if you build DCBSR with
-DUSE_MPI_F08=ON
. Can you confirm that?
Thank you very much! I will test that tonight.
@alazzaro Unfortunately, still fails:
FAILED: src/CMakeFiles/dbcsr.dir/mpi/dbcsr_mpiwrap.F.o src/dbcsr_mpiwrap.mod
/opt/local/bin/mpif90-mpich-gcc12 -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/build/src/mpi -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/base -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/build/src -ffree-form -std=f2008ts -fimplicit-none -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-parameter -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Werror=uninitialized -Wno-maybe-uninitialized -Werror=unused-parameter -fallow-argument-mismatch -mmacosx-version-min=10.6 -Jsrc -fPIC -fopenmp -Wno-error -fpreprocessed -c src/CMakeFiles/dbcsr.dir/mpi/dbcsr_mpiwrap.F-pp.f -o src/CMakeFiles/dbcsr.dir/mpi/dbcsr_mpiwrap.F.o
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src/mpi/dbcsr_mpiwrap.F:5543:65:
5543 | CALL MPI_ALLOC_MEM(mp_size, mp_info, mp_baseptr, mp_res)
| 1
Error: Type mismatch in argument 'baseptr' at (1); passed TYPE(c_ptr) to INTEGER(4)
[239/346] /opt/local/bin/mpicxx-mpich-gcc12 -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/build/src -I/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/src -pipe -Os -DNDEBUG -I/opt/local/include -D_GLIBCXX_USE_CXX11_ABI=0 -std=gnu++11 -arch ppc -mmacosx-version-min=10.6 -MD -MT tests/CMakeFiles/dbcsr_test.dir/dbcsr_test.cpp.o -MF tests/CMakeFiles/dbcsr_test.dir/dbcsr_test.cpp.o.d -o tests/CMakeFiles/dbcsr_test.dir/dbcsr_test.cpp.o -c /opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_math_dbcsr/dbcsr/work/dbcsr-397bf0f80c293a0c6088a1314931a748cff4b5b6/tests/dbcsr_test.cpp
This is the latest commit DBCSR
, gcc
12.3.0, mpich-gcc12
@4.1.1_0+fortran
To the config in the portfile https://github.com/macports/macports-ports/blob/master/math/dbcsr/Portfile
I have added -DUSE_MPI_F08=ON
.
@alazzaro Building DBCSR with MPICH 4.1.2 and GCC 13.1.0 using the cmake flag -DUSE_MPI_F08=ON
works fine for me. With -DUSE_MPI_F08=OFF
, compilation errors as shown above and as reported in the CP2K issue #2808 occur: Error: Type mismatch in argument 'baseptr'
.
However, make test
reports that one test (test 18 out of 19) is failing, but that is also the case with MPICH 4.0.3.
@mkrack thanks for confirmation, I did a similar test on MacOS (GCC 13.1, MPICH 4.1.2) and it works.
@barracuda156 just for confirmation, I don't see -DUSE_MPI_F08=ON
in https://github.com/macports/macports-ports/blob/master/math/dbcsr/Portfile....
There is another interesting consideration. I used brew to install MPICH. When I use -DUSE_MPI_F08=ON
, cmake reports the following:
-- Found MPI: TRUE (found version "4.0") found components: C CXX Fortran
CMake Warning at CMakeLists.txt:203 (message):
The listed MPI implementation does not provide the required mpi_f08.mod
interface. The Fortran 90 bindings will be used instead.
So basically the brew version of MPICH is missing the F08 interface and we are back to the original problem of the F77 MPI interface (we will fix it). Can you check your MPI has the F08 module? My understanding is that only new GCC can compile MPICH with F08 support (see for example here). In my case I have to recompile MPICH via:
brew reinstall --build-from-source mpich
to make sure I have the F08 module.
@alazzaro This is an awesome point. Indeed, it is disabled: https://github.com/macports/macports-ports/blob/100bdfdca9908a07bb07a92663434f401c1f71f9/science/mpich/Portfile#L179-L180 I recall taking part in the related discussion – we disabled it for a reason, it failed to build.
Need to review why it failed, but we do have new GCC (all tested systems use 12.3.0 now, including my PowerPC ones).
@alazzaro I have built MPICH 4.1.2 with enabled F08 now, and DBCSR built fine. Your solution works.
OK, then we have a solution.
My summary is the following:
@barracuda156 is this something reasonable for you? In your case it requires MPICH 4.1 with F08 support, add the flag -DUSE_MPI_F08=ON
and a recent GCC compiler.
I would like to thank @fstein93 who did the F08 porting of the MPI code and @mkrack for testing it.
@alazzaro I do not control MPICH port in Macports, while I am a maintainer of DBCSR port, so I can only say I hope that will work. I have requested maintainers of MPICH to enable F08 in the next update to the port (that will have to be tested on other systems – I only verified it builds for me locally). If that is done, I will add the fix to DBCSR, so that it can be built again normally.
Requirement for a new GCC will temporarily leave PowerPC builds broken on < 10.6, but I plan to update those to GCC 12 anyway, hopefully soon. (Technically everything ready for that, but changes to toolchain aren’t the easiest to push through.)
Well, this is only required for the new MPICH 4.1. In all other cases, the default will work...
Well, this is only required for the new MPICH 4.1. In all other cases, the default will work...
Well, MPICH 4.x is the current reality. (Introducing back a legacy MPICH just to build one port is too much. I hope we can sort out enabling F08 instead.)
I'm going to close this issue, please open a new for further discussions
@alazzaro what's the current status of this?
I'm running into this issue with spack install dbcsr@2.6.0 ^mpich@4.1.2
and would like to add the relevant conflicts / defines.
So:
USE_MPI_F08
define, meaning it's incompatible with all mpich 4.1 and higher? Or is it also conditional on the underlying gcc?-DUSE_MPI_F08=ON
when using mpich@4.1: for all compilers? Or only recent gcc?The problem is in the new MPICH, so nothing to do with compilers. Saying that, only new compilers support the F08 API.
Could you take a look at https://github.com/spack/spack/pull/40494?
Could you take a look at spack/spack#40494?
Seems reasonable. Do you have any other question?
Do you have any other question?
I didn't understand whether the CP2K build issue with mpich 4.1 is because CP2K builds vendored DBCSR and hits the issue in this thread, or because it also needs fixes inside CP2K itself?
The latter... this is due to MPICH 4.1 to be strictly complainant with the standard, therefore they enforce the full F77 interface, unless you ask for the F08. You get the first error in DBCSR simply because it is the first to compile...
For the record, Macports still stuck with non-F08 MPICH, waiting for it to be updated. Locally I have MPICH 4.1.2 with F08 and gcc13, works fine on the old 10.6 PowerPC :)
With mpich 4.0.2 I get warnings instead: