GEOS-DEV / LvArray

Portable HPC Containers (C++)
BSD 3-Clause "New" or "Revised" License
47 stars 10 forks source link

Changes for Memory-Space Aware Packing #293

Closed wrtobin closed 1 year ago

wrtobin commented 1 year ago

Basically just adding a checkTouch function to determine whether the buffer has been touched in the current memory space so we can avoid unneeded host-packing when a buffer is moved back to host for some other read-only purpose.

corbett5 commented 1 year ago

When would you use this instead of just array.move( space, false )?

wrtobin commented 1 year ago

That would modify previousSpace if there was an actual move, right?

I still need to check if this fully fixes the issue I was seeing, which is that a pack was taking place on host due to a blueprint (or silo) output event triggering to move the data back to host on the first timestep prior to the collection event triggering.

The collection then happened on host, which should be fine, but uninitialized values were coming out for that collect, so I wanted to see if collecting from the device buffer instead fixed the issue.

Okay I just checked and this does fix the issue I was seeing with that first collection. Trying to think if possibly the bug is still in the collection operation, erroneously updating an offset incorrectly or something.

wrtobin commented 1 year ago

Hopefully this actually winds up not being needed and I can just throw this branch away, just need to figure out the root of the bug in that particular case.

corbett5 commented 1 year ago

That would modify previousSpace if there was an actual move, right?

Pretty sure previousSpace is only updated when the data is touched in a new space.

https://github.com/LLNL/CHAI/blob/d4a0a5700caed3858a8f285abc710cf7ac35fe59/src/chai/ManagedArray.inl#LL421C11-L421C11 https://github.com/LLNL/CHAI/blob/d4a0a5700caed3858a8f285abc710cf7ac35fe59/src/chai/ArrayManager.cpp#L204

wrtobin commented 1 year ago

Ah okay, I figured out the bug regardless and will be able to drop this branch once I am able to put together the fix in geos.

So if I have a buffer touched on device and invoke move(host, false) it will move the data to host, but not modify previousSpace. If I then subsequently -- in say a separate geos event that doesn't know if any other event has fired to move the data -- move(host, false) again there would again be data movement, since nothing in the underlying state is modified which would prevent the erroneous extra move. Basically I was hoping there was a mechanism to get the 'set of valid spaces' where the data is currently up to date (for a read-only operation), but looking at the chai record that probably isn't trivially recoverable.

klevzoff commented 1 year ago

IIUC chai clears the touch markers every time the data is physically copied, so a second move should not be occurring, since the data will be considered in-sync: https://github.com/LLNL/CHAI/blob/d4a0a5700caed3858a8f285abc710cf7ac35fe59/src/chai/ArrayManager.cpp#L274