desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

gpu - eigh is current bottleneck #11

Closed lastephey closed 1 year ago

lastephey commented 4 years ago

Mark filed this issue:

https://github.com/cupy/cupy/issues/3154

but it was closed since the issue seems to be within cusparse rather than cupy.

Laurie will contact NVIDIA after GTC to find out where it should be re-opened.

Since eigh was the largest part of our runtime during the March 2020 hackathon it's worth checking to see if NVIDIA can make this function any faster.

lastephey commented 4 years ago

Thanks to @beckernick this is now an internal NVIDIA bug. Will post updates if I hear anything.

lastephey commented 4 years ago

Update: I did recently investigate JAX eigh as a possible alternative to CuPy eigh.

JAX appears to calculate eigh only using single precision, even when explicitly given an input of type float64.

When CuPy and JAX eigh were both compared in single precision, the benchmark timings were nearly identical. @sbailey @dmargala @rcthomas

lastephey commented 4 years ago

Daniel should point to his CuPy PR that implements batching to speed up eigh via batched calls to cusolver

lastephey commented 1 year ago

Closing in the spirit of Closember.