GEOS-DEV / LvArray

Portable HPC Containers (C++)
BSD 3-Clause "New" or "Revised" License
47 stars 10 forks source link

Removing `new` operator on already allocated memory to test Frontier memory corruption issue. #313

Open wrtobin opened 7 months ago

wrtobin commented 7 months ago

This is WIP work on a memory corruption issue on Frontier for one of the finest-scale ECP problems.

We may decide to merge this after removing all string arrays in geos, which IIRC is one of the cases that requires the new usage currently in lvarray (though might be misremembering).

corbett5 commented 7 months ago

Just curious, the problem is using placement new? Things like

new ( dst ) T( *first );

https://github.com/GEOS-DEV/LvArray/blob/6057f598b005db6efd132f60c298ad9c827fe3cd/src/arrayManipulation.hpp#L185C5-L185C29

wrtobin commented 7 months ago

We're getting a memory failure on lassen frontier in that routine so this branch is/was just to test if bypassing that usage happened to resolve the issue. So far seems like that wasn't the reason for failure, which is only occurring on one of our largest stretch problems that uses at least 1/4 of the machine IIRC.

This PR isn't intended to be merged unless we fully confirm this is both the issue and the only way to resolve it.

wrtobin commented 7 months ago

Its really more me not trusting the hip compiler to handle syntax / usage that isn't super common (as I have experienced numerous times during the HIP port).

corbett5 commented 7 months ago

So this is a problem on Lassen with CUDA as well?

Yeah I would also suspect CUDA/HIP of not correctly implementing placement new, since I've only seen it a handful of times outside of LvArray.

wrtobin commented 7 months ago

Sorry no, just frontier/hip, I was typing that while in a meeting where we were discussing other things on lassen.

corbett5 commented 7 months ago

Phew, okay that makes me feel better. If it doesn't work on Lassen that's on me. But HIP...