ECP-copa / ExaMPM

Material point method proxy application based on Cabana.
BSD 3-Clause "New" or "Revised" License
9 stars 11 forks source link

Integrate the Cabana load balancer #29

Open aetx opened 3 years ago

aetx commented 3 years ago

Integrating the new Cabana load balancer instead of using it directly.

aetx commented 2 years ago

On summit the loadbalanced DamBreak currently segfaults after a few iterations. This does not happen on juwels (V100), juwels-booster (A100) or my local machine.

On summit this is caused by running DamBreak using

bsub -P CSC304 -W 30 -nnodes 1 -q debug -Is /bin/bash
jsrun -n4 -c1 -g1 --smpiargs="-gpu" ./DamBreak 0.01 2 0 0.0001 1 10000 cuda

It throws the error:

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaStreamSynchronize(m_stream) error( cudaErrorIllegalAddress): an illegal memory access was encountered /ccs/home/sschulz/repos/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:314
Traceback functionality not available

The error disappears for a debug build of kokkos or if the array access is checked manually in Cabana.

ExaMPM is compiled using ExaMPMCompile.txt (here as .txt since GitHub does not recognize .sh files...) The script assumes ~/repos as the current working directory and assumes that the modules cuda cmake are loaded.