rrdesi_mpi --gpu out of memory errors unless specifying --max-gpuprocs 4

Currently rrdesi_mpi --gpu ... with otherwise default arguments runs out of memory:

$> cd /global/cfs/cdirs/desi/spectro/redux/iron/tiles/cumulative/100/20210505
$> srun -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 --cpu-bind=cores rrdesi_mpi -i coadd-0-100-thru20210505.fits -o $SCRATCH/blat.fits --gpu
...
Proc 32: cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 333,274,112 bytes (allocated so far: 1,974,255,616 bytes).

For this to work, you also have to set --max-gpuprocs 4 to match the number of GPUs on a Perlmutter node:

$> srun -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 --cpu-bind=cores rrdesi_mpi -i coadd-0-100-thru20210505.fits -o $SCRATCH/blat.fits --gpu --max-gpuprocs 4

In the spirit of "the default arguments should do the recommended right thing", this should inspect to discover the number of available GPUs and auto-throttle max-gpuprocs to not overwhelm the memory. It's unclear to me if max-gpuprocs is a development leftover or if we have a real use case (including future development) where we would want to set that to be something other than the number of available GPUs. i.e. do we even need that option anymore?

@craigwarner-ufastro @dmargala

desihub / redrock

rrdesi_mpi --gpu out of memory errors unless specifying --max-gpuprocs 4 #235