Closed yaoyi92 closed 2 years ago
Thanks Yi for a detailed error report.
Can you try putting MPI_C_FLOAT_COMPLEX
instead of MPI_CXX_FLOAT_COMPLEX
, and similarly for the double type at these lines:
https://github.com/eth-cscs/COSMA/blob/783803e9a48944a16c9b95db0b027955b2594755/src/cosma/mpi_mapper.hpp#L30
https://github.com/eth-cscs/COSMA/blob/783803e9a48944a16c9b95db0b027955b2594755/src/cosma/mpi_mapper.hpp#L35
Yes, I can confirm the modification solves the problem. Thank you very much for the quick reply.
Great to hear that! Although I am still confused why this is a problem. @rasolca do you know why MPI_C works and MPI_CXX doesn't?
@yaoyi92 keep in mind that you can also use COSMA with gpu-aware MPI or with NCCL backends, as described in the README, that should be much more performant! This should be the biggest change in this version.
Thanks! I will check them out.
@yaoyi92 if it's not a problem for you, can you also try leaving the MPI_CXX prefixes there, but modifying the cmake:
https://github.com/eth-cscs/COSMA/blob/783803e9a48944a16c9b95db0b027955b2594755/CMakeLists.txt#L124
to find_package(MPI COMPONENTS C CXX REQUIRED)
.
Maybe this was the problem?
It is not a cmake problem. It is a problem with newer versions of Cray mpich. MPI_CXX_FLOAT_COMPLEX, MPI_CXX_DOUBLE_COMPLEX and MPI_CXX_BOOL are not set (the MPI standard requires them even if the C++ bindings are not provided). We opened a ticket about it some time ago but still no solution from HPE side.
@rasolca do you propose then to put MPIC in the code as a temporary solution?
In general not, but it is needed for Cray-EX systems.
Alright! As a temporary solution we modified this in commit 5e71fac until cray-mpich fixes it.
Dear COSMA developers,
I am able to install COSMA on the Perlmutter computer and it works fine with float/double numbers. However, when I try to use the complex numbers(zfloat/zdoulbe), the code crashed during some MPI processes and it seems the MPI cannot recognize complex values.
I am able to reproduce the crash with the cosma_miniapp. The results/running scripts/compilation commands are listed here.
Best wishes, Yi
The error
The script to run cosma_miniapp
The script to build COSMA