desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Implement async io #56

Closed dmargala closed 3 years ago

dmargala commented 3 years ago

The main improvement in this PR comes from the use of a new AsyncIOComm class that uses extra MPI ranks to interleave IO read/write operations between frames while extracting a whole exposure. There were a few improvements in other gpu functions as well as some refactoring of the spex command line program into a module function that came along for the ride.

The tables below shows before and after results using the 30 frame exposure extract script using a single node with 4 GPUs and 2 MPI ranks per GPU on corigpu (5 MPI ranks per GPU on dgx).

Before:

system io elapsed time (sec) FPNH FPGH
corigpu sync 422.5 255.64 63.91
dgx sync 244.4 441.89 110.47

This PR:

system io elapsed time (sec) FPNH FPGH Improvement
corigpu sync 362.9 297.57 74.39 1.16x
corigpu async 329.4 327.86 81.97 1.28x
dgx sync 222.0 486.39 121.60 1.10x
dgx async 170.0 635.14 158.78 1.44x

Cori GPU commands:

time srun -n 8 -c 2 --cpu-bind=cores mps-wrapper desi-extract-exposure ${INDIR} ${JOBOUTDIR} $(date +%s) --night ${NIGHT} --expid ${EXPID} --gpu
time srun -n 10 -c 2 --cpu-bind=cores mps-wrapper desi-extract-exposure ${INDIR} ${JOBOUTDIR} $(date +%s) --night ${NIGHT} --expid ${EXPID} --gpu --async-io

DGX commands:

time srun -n 20 -c 2 --cpu-bind=cores mps-wrapper desi-extract-exposure ${INDIR} ${JOBOUTDIR} $(date +%s) --night ${NIGHT} --expid ${EXPID} --gpu
time srun -n 22 -c 2 --cpu-bind=cores mps-wrapper desi-extract-exposure ${INDIR} ${JOBOUTDIR} $(date +%s) --night ${NIGHT} --expid ${EXPID} --gpu --async-io

Unit tests are passing:

(gpu-specter-dev) dmargala@cgpu15:gpu_specter> srun -n 1 -c 2 --cpu-bind=cores python -m unittest gpu_specter.test.test_suite
.......................
----------------------------------------------------------------------
Ran 23 tests in 34.212s

OK