Occasionally the mpi buffer for AMR load balancing exceeds the vram limit and leading to a segfault. This is a feature rather than a bug, but it would be nice if this can automatically trigger saving a rst file so that one can restart with more gpus at the point of failure
In GitLab by @Hengrui_Zhu on Apr 23, 2024, 10:57
Occasionally the mpi buffer for AMR load balancing exceeds the vram limit and leading to a segfault. This is a feature rather than a bug, but it would be nice if this can automatically trigger saving a rst file so that one can restart with more gpus at the point of failure