Open victorapm opened 5 months ago
Just looking at the first few lines of the gdb output it looks like arrayManipulation::emplace
is being passed a nullptr
. Unfortunately the few frames above this argument is optimized out so you can't tell where it is introduced. Any chance this problem is running out of memory, I imagine malloc
returning a nullptr
might cause this exact problem.
Thanks, Ben!
It doesn't seem this problem runs out of memory since the high watermarks for the level5 version of it (coarser by a factor of 2 in each direction) look good:
Umpire DEVICE sum across ranks: 22211.7 G
Umpire DEVICE rank max: 11.4 GB
Umpire DEVICE::0 sum across ranks: 22211.7 G
Umpire DEVICE::0 rank max: 11.4 GB
Umpire HOST sum across ranks: 13830.5 G
Umpire HOST rank max: 6.9 GB
Umpire HYPRE_DEVICE sum across ranks: 12683.7 G
Umpire HYPRE_DEVICE rank max: 6.5 GB
Umpire PINNED sum across ranks: 1173.8 GB
Umpire PINNED rank max: 671.9 MB
i.e., we have 64 GB available on the GPU and use only about 18 GB. But, anyway, it would probably be good to double-check that...
I guess recompiling in full Debug
mode and perhaps with -DLVARRAY_BOUNDS_CHECK
will help us with more info
Yeah after a tad more thought I doubt it's an OOM error since in those cases we usually see the error from Umpire directly. But certainly if it's not terribly slow running in full debug mode will hopefully tell us more.
I think Debug mode will be horrendously slow for this case. Perhaps some more targeted debuting via printf
statements are in order here? Maybe start with the insertNonZero
and everything "finer" that. It will be a bit of a mess, but you stand a better chance than with a debug run at 1/4 frontier scale.
This issue arises for compositionalMultiphaseFlow at large scale, e.g., level6 problem
Error message on Frontier:
Inspecting the
core
file withgdb
: