support cudaMallocManaged in MA

GlobalArrays / ga

Partitioned Global Address Space (PGAS) library for distributed arrays

http://hpc.pnl.gov/globalarrays/

Other

97 stars 38 forks source link

support cudaMallocManaged in MA #210

Closed jeffhammond closed 3 years ago

jeffhammond commented 3 years ago

This is a minimal patch to allow MA to use cudaMallocManaged for the MA slab. This will have a huge impact on the performance of codes like NWChem that allocate data with MA and pass it to CUDA kernels. Unified memory support for cudaMallocManaged is much more performance-portable than generic heap memory.

The configuration option is minimal and assumes the user will link libcudart.so manually, which is a reasonable assumption when GA is used as a library with CUDA code. The test programs require LDFLAGS=-lcudart to link.

I did not test CMake because I do not use GA with CMake and do not know how to do so.

jeffhammond commented 3 years ago

@bjpalmer any thoughts on this? it is hard to use this in NWChem if it's in a branch...

bjpalmer commented 3 years ago

Sorry, I've been swamped. I will try and get to it early next week.

bjpalmer commented 3 years ago

How are you incorporating it into the build? We've been looking at doing some things with Cuda and GPU-aware MPI and have been toggling it with ENABLE_CUDA. We haven't added it to the autotools build.

jeffhammond commented 3 years ago

It's enabled in the Autotools build with MA_ENABLE_CUDA_MEM (source code macro) and --enable-cuda-mem (configure option). I made a minimal attempt to add to the CMake stuff but likely didn't do enough.

This patch only changes how MA works. The goal is to make it so that all of NWChem's memory that is allocated with ma_push_get is CUDA UM, which should not require any awareness in the communication layer, since in the worst case, the communication calls will migrate memory back to the CPU.

bjpalmer commented 3 years ago

I've gotten no response from any likely to be offended by these changes, so I will merge them.