Open wrtobin opened 9 months ago
Just curious, the problem is using placement new? Things like
new ( dst ) T( *first );
We're getting a memory failure on lassen frontier in that routine so this branch is/was just to test if bypassing that usage happened to resolve the issue. So far seems like that wasn't the reason for failure, which is only occurring on one of our largest stretch problems that uses at least 1/4 of the machine IIRC.
This PR isn't intended to be merged unless we fully confirm this is both the issue and the only way to resolve it.
Its really more me not trusting the hip compiler to handle syntax / usage that isn't super common (as I have experienced numerous times during the HIP port).
So this is a problem on Lassen with CUDA as well?
Yeah I would also suspect CUDA/HIP of not correctly implementing placement new, since I've only seen it a handful of times outside of LvArray.
Sorry no, just frontier/hip, I was typing that while in a meeting where we were discussing other things on lassen.
Phew, okay that makes me feel better. If it doesn't work on Lassen that's on me. But HIP...
This is WIP work on a memory corruption issue on Frontier for one of the finest-scale ECP problems.
We may decide to merge this after removing all string arrays in geos, which IIRC is one of the cases that requires the
new
usage currently in lvarray (though might be misremembering).