Open seanlaw opened 2 months ago
@seanlaw I'd be happy to give this one a shot, no promises as it looks a bit complex, but would love to try!
@joehiggi1758 Do you have access to an NVIDIA GPU for testing? Otherwise, it might be very painful to assess the performance of any code changes. If you do then please proceed and let me if you have any questions or we can also reach out to our collaborator at NVIDIA for help as well (I'm sure there are new features that we may be able to leverage).
Alternatively, you may be interested in this new issue #1031 and attempting to reproduce the work. It has less baggage than this current issue.
Hey @seanlaw hope you're having a great Saturday!
Unfortunately I don't have access to a GPU other than maybe a free subscription to Azure. I think starting with the NVIDIA contact is a better plan of attack! If I can help there in any way lmk, I'd love to more about GPUs!
I'll focus on #1031 for now as you're right that does look a bit better as a next issue for me!
Several years ago, we considered (see #266) adding a variant of GPU-STUMP that utilized cooperative groups and that would allow us to push the multiple kernel launches onto the device. Earlier work was concerned about:
However, cudatoolkit support is much better now and older GPUs that lack cooperative group support are likely end-of-life (and so the above concerns are likely a thing of the pst now). Additionally,
numba
has moved ahead many, many versions since our last attempt. Thus, we should reconsider adding this to STUMPY. PR #266 provides some clear code for how to proceed and had demonstrated a 12% speedup, which is great!See also the
numba
docs on cooperative groups