IAS-Astrophysics / athenak

Performance-portable version of the Athena++ astrophysical AMR-MHD code using Kokkos.
BSD 3-Clause "New" or "Revised" License
34 stars 24 forks source link

AMR memory allocation failure during load balancing #408

Open jmstone opened 6 months ago

jmstone commented 6 months ago

In GitLab by @Hengrui_Zhu on Apr 23, 2024, 10:57

Occasionally the mpi buffer for AMR load balancing exceeds the vram limit and leading to a segfault. This is a feature rather than a bug, but it would be nice if this can automatically trigger saving a rst file so that one can restart with more gpus at the point of failure