desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Enable gpu spex #41

Closed dmargala closed 4 years ago

dmargala commented 4 years ago

This PR enables the gpu code path in spex. The main missing piece was gpu_specter.extract.gpu.ex2d_padded which ended up being almost identical to the cpu version.

Tests

In the process of testing this PR, I discovered a bug in gpu_specter.extract.gpu.projection_matrix that wasn't surfaced by the existing unit tests. I added a test case that fails before the fix and now passes after the fix. When I finally tracked down the bug, I was a little surprised that the existing tests were passing before.

I've also verified that the cpu/compare_specter tests all still pass using the master desi environment on cori.

The output values from the gpu version are nearly identical to the output from the cpu/mpi version (~99.96% of pixels are are within np.isclose).

Performance

I've littered the code with timing statements to get a better sense of where time is being spent. The timing statements helped identify cp.linalg.lstsq as a hot spot in gpu_specter.extract.both.xp_deconvolve. Changing that to cp.linalg.solve brought the run time for a single frame using a single gpu down from ~10 min to ~4 min.

Overall the main bottleneck still seems to be the eigh function. The figure below shows a screenshot from the Nsight profiling tool. I've zoomed in to a section of the timeline that shows what's going on at the end of bundle and beginning of a new bundle. There is a little bit of data transfer between the device/host but it's pretty minor compared to the ~235 calls to ex2d_patch per bundle.

spex-gpu-bundle-boundary