The main improvement in this PR comes from the use of a new AsyncIOComm class that uses extra MPI ranks to interleave IO read/write operations between frames while extracting a whole exposure. There were a few improvements in other gpu functions as well as some refactoring of the spex command line program into a module function that came along for the ride.
The tables below shows before and after results using the 30 frame exposure extract script using a single node with 4 GPUs and 2 MPI ranks per GPU on corigpu (5 MPI ranks per GPU on dgx).
The main improvement in this PR comes from the use of a new
AsyncIOComm
class that uses extra MPI ranks to interleave IO read/write operations between frames while extracting a whole exposure. There were a few improvements in other gpu functions as well as some refactoring of the spex command line program into a module function that came along for the ride.The tables below shows before and after results using the 30 frame exposure extract script using a single node with 4 GPUs and 2 MPI ranks per GPU on corigpu (5 MPI ranks per GPU on dgx).
Before:
This PR:
Cori GPU commands:
DGX commands:
Unit tests are passing: