desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Refactor to allow multiple ranks per GPU with MPS #42

Closed dmargala closed 4 years ago

dmargala commented 4 years ago

This PR makes some modifications to allow for multiple GPUs and multiple MPI ranks per GPU. There are essentially 5 different code execution paths intertwined together now (new paths in bold):

The code is working with up 4 ranks per GPU, although performance benefit after 2 ranks per GPU is negligible.

image

In order to work around some memory errors that were occurring during MPI communication, I implemented gpu_specter.util.gather_ndarray to gather multidimensional numpy arrays directly (without serialization) using a vector variant gather operation.

dmargala commented 4 years ago

The most recent commit fixes the performance issue when running with 1 GPU + MPI. Also added a few comments to code that determines MPI/GPU division of labor and communication strategy.