cb-geo / mpm

CB-Geo High-Performance Material Point Method
https://www.cb-geo.com/research/mpm
Other
235 stars 82 forks source link

Error in transfer_norank_particles #696

Closed bodhinandach closed 3 years ago

bodhinandach commented 3 years ago

Describe the bug I am noticing a problem with our dynamic load balancing feature or the resume feature which use the function transfer_norank_particles in mesh.tcceven in develop. The following screenshot is taken after running the benchmark 3D hydrostatic case with dynamic load balancing, or with resume, as they call mpi_domain_decompose(false) which further calls the transfer_norank_particles.

To Reproduce Steps to reproduce the behavior:

  1. Compile develop using MPI
  2. Run with mpirun -n xx mpm any problem in benchmark
  3. Activate resume to true or add "nload_balance_steps": 10 in mpm.json to fasten the dynamic load balancing.
  4. See the error.

Expected behavior We should not have arbitrary number of ptype as otherwise we cant do resume nor dynamic load balancing feature.

Screenshots image

As you can see above, even though that most of the ptype received is correct, equal to 1 for 3D particles, there are some time that it receive a weird number, in this case, 1456803152, and thus, we cant retrieve the appropriate type with ParticleTypeName map. As indicated, there is a PMIX ERROR while receiving the particle.

Runtime environment (please complete the following information):

Additional context This is the same reason why our nightly fails since #689 is merged. From today's nightly build of running 2D sliding block: image