desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Batch subbundle extraction on GPU #55

Closed dmargala closed 3 years ago

dmargala commented 3 years ago

This PR attempts to leverage more batch operations on the GPU by refactoring the extraction of individual patches into a stack of patches. It's a little tricky because the patches on the ccd do not have the same shape. The patches do stack cleanly once the projection and weight matrices are applied, so the heavy linear algebra operations (solve and eigh) can be performed in batch.

The tables below shows before and after results using the 30 frame exposure extract script using a single node with 4 GPUs and 2 MPI ranks per GPU on corigpu (5 MPI ranks per GPU on dgx).

Before:

system elapsed time (sec) FPNH FPGH
corigpu 876.9 123.16 30.79
dgx 618.6 174.60 43.65

This PR:

system elapsed time (sec) FPNH FPGH
corigpu 399.3 270.45 67.61
dgx 246.7 437.85 109.46