cb-geo / mpm

CB-Geo High-Performance Material Point Method
https://www.cb-geo.com/research/mpm
Other
245 stars 82 forks source link

Error when running with parallel tasks #651

Closed thiagordonho closed 4 years ago

thiagordonho commented 4 years ago

Describe the bug An error occurs when running a model with parallel tasks. The output states that MPM main: Couldn't find key.MPM main. Specifically, this error was firstly identified when submitting a job to run on multiple nodes on TACC's supercomputer: Stampede2.

To Reproduce Steps to reproduce the behavior:

  1. Compile with
    make -DBOOST_ROOT=$TACC_BOOST_DIR -DBOOST_INCLUDE_DIRS=$TACC_BOOST_INC -DCMAKE_BUILD_TYPE=Release -DEIGEN3_INCLUDE_DIR=$HOME/eigen -DKAHIP_ROOT=$HOME/KaHIP .. && make -j8
  2. Run on
    ibrun $WORK/path/to/build/directory -f /path/to/file/ -i mpm.json
  3. On condition: 2 nodes and 4 MPI tasks per node and using a "nload_balance_steps" smaller than the "nsteps"
  4. See error

Expected behavior The simulation was expected to continue after the first multiple of nload_balance_steps is reached but it stops.

Screenshots The following is a screenshot of the output after the error has ocurred:

image

Runtime environment (please complete the following information):

Additional context A detailed description for running on TACC is given here. A detailed description for running on your local machine with multiple tasks and MPI is given here.

kks32 commented 4 years ago

The error happens in robinhood::map either we are missing a pointer to a particle or node. Needs further investigation.

kks32 commented 4 years ago

I have narrowed this problem to transfer_halo_particles function call.