cb-geo / mpm

CB-Geo High-Performance Material Point Method
https://www.cb-geo.com/research/mpm
Other
238 stars 82 forks source link

MPI Test fails when run in certain number of node #662

Closed bodhinandach closed 4 years ago

bodhinandach commented 4 years ago

Describe the bug I just noticed that ./mpmtest in develop fails when run in by using mpirun for -n 4. I am not sure whether this is just in my machine or not. It works okay for all other sizes of rank.

To Reproduce Steps to reproduce the behavior:

  1. Compile develop using the following:
    cmake ..                                                \
    -DCMAKE_BUILD_TYPE=Release                              \
    -DCMAKE_CXX_COMPILER=mpicxx                             \
    -DKAHIP_ROOT=/directory/KaHIP          \
    -DMPM_BUILD_LIB=On                                      \
    -DHALO_EXCHANGE=On        
  2. Run on mpirun -n 4 ./mpmtest [mpi]
  3. You may have the following error: image

Runtime environment (please complete the following information):

Additional context As I mentioned above, the test run completely fine in other number of rank.

kks32 commented 4 years ago

The MPI test is specific for 4 threads, and seems to run fine on Fedora and CircleCI nightly builds and regular builds: https://app.circleci.com/pipelines/github/cb-geo/mpm/921/workflows/b4202709-791c-4d84-b1d3-748a7a86f9fb/jobs/5298.

Could you try running the MPI test without building a shared object library, i.e., set -DMPM_BUILD_LIB=Off.

bodhinandach commented 4 years ago

Thanks @kks32, I set -DMPM_BUILD_LIB=Off and it works! Should it be turned off all the time we are running using mpirun?

kks32 commented 4 years ago

I think it would be good to remove the build lib option, so we don't have this error.