axom::Array constructor crash on CUDA.

BradWhitlock commented 1 month ago

Code like the following resulted in Array::Array trying to initialize elements of a device-allocated array using placement new on the host. The code SEGV'd.

using ExecSpace = axom::CUDA_EXEC<256>;
const int allocatorID = axom::execution_space<ExecSpace>::allocatorID();
axom::Array<int> arr(n, n, allocatorID);

This method calls initialize() with 2 arguments, making the 3rd argument the detault of true, which is to default-construct. https://github.com/LLNL/axom/blob/70b360815ebed6a25e6ae369bc5efeaa58cacdbc/src/axom/core/Array.hpp#L1084

https://github.com/LLNL/axom/blob/70b360815ebed6a25e6ae369bc5efeaa58cacdbc/src/axom/core/Array.hpp#L1591

I think the root of the problem could be that Array::m_executeOnGPU is not initialized anywhere. Valgrind was logging uninitialized memory in this area and m_executeOnGPU is probably the culprit.

Calling axom::Array(n, n, allocatorID) where allocatorID is a CUDA allocator should not cause a SEGV and it should initialize the data as needed on device.

I was told that ATS might have some bearing here too.

zansel7{whitlocb}103: detect_ats
rzansel7     ATS detected

BradWhitlock commented 1 month ago

I'm trying a fix to set m_executeOnGPU based on the memory space, inside of Array::initialize, Array::initialize_from_other, and one of the constructors that calls neither of those methods.

bmhan12 commented 1 month ago

As mentioned previously:

Link to documentation on setting/disabling the Address Translation Services (ATS), and checking if it is enabled/disabled (Point 19): https://lc.llnl.gov/confluence/display/SIERRA/Quickstart+Guide

rhornung67 commented 1 month ago

Thanks @BradWhitlock . We can have @publixsubfan look into this to make sure other issues don't occur.

BradWhitlock commented 1 month ago

Update. I've had some trouble reproducing the crash on develop. The Array::m_executeOnGPU member is uninitialized but it does not seem to matter much. When it fails in my branch, it seems like some bad optimization might be at work. I was getting the allocatorID to pass from execution_space\<ExecSpace>::allocatorID() and it seemed (in Totalview) that the allocatorID was getting optimized out. If I make it "volatile" to prevent inlining then I can see it returns 3 and it works normally. The code resembles:

void buildShapeMap(axom::ArrayView<axom::IndexType> &values, axom::ArrayView<axom::IndexType> &ids, int allocatorID)
{
  const axom::IndexType n = // get the size
  values = axom::Array<IndexType>(n, n, allocatorID);
  ids = axom::Array<IndexType>(n, n, allocatorID);
  // Fill values, ids here
}
...

/*volatile*/ int allocatorID = axom::execution_space<ExecSpace>::allocatorID();
axom::Array<axom::IndexType> values, ids;
buildShapeMap(values, ids, allocatorID);

publixsubfan commented 1 month ago

Yes, I believe we need to initialize m_executeOnGPU to an appropriate default value. Good catch @BradWhitlock.

But is this happening with CUDA device-only memory? The value of that variable should be immaterial -- we should be passing through to special logic for that case.

LLNL / axom

axom::Array constructor crash on CUDA. #1432