Closed jeffhammond closed 3 years ago
@bjpalmer any thoughts on this? it is hard to use this in NWChem if it's in a branch...
Sorry, I've been swamped. I will try and get to it early next week.
How are you incorporating it into the build? We've been looking at doing some things with Cuda and GPU-aware MPI and have been toggling it with ENABLE_CUDA. We haven't added it to the autotools build.
It's enabled in the Autotools build with MA_ENABLE_CUDA_MEM
(source code macro) and --enable-cuda-mem
(configure option). I made a minimal attempt to add to the CMake stuff but likely didn't do enough.
This patch only changes how MA works. The goal is to make it so that all of NWChem's memory that is allocated with ma_push_get
is CUDA UM, which should not require any awareness in the communication layer, since in the worst case, the communication calls will migrate memory back to the CPU.
I've gotten no response from any likely to be offended by these changes, so I will merge them.
This is a minimal patch to allow MA to use cudaMallocManaged for the MA slab. This will have a huge impact on the performance of codes like NWChem that allocate data with MA and pass it to CUDA kernels. Unified memory support for cudaMallocManaged is much more performance-portable than generic heap memory.
The configuration option is minimal and assumes the user will link libcudart.so manually, which is a reasonable assumption when GA is used as a library with CUDA code. The test programs require
LDFLAGS=-lcudart
to link.I did not test CMake because I do not use GA with CMake and do not know how to do so.