add cuda backend for KLU

joamatab commented 6 months ago

inspired by

@jan-david-fischbach @Vivswan @flaport @bdice

flaport commented 6 months ago

Is there a cuda compatible sparse matrix solver we can use in stead of klu here?

joamatab commented 6 months ago

yes, @bdice used cupy for that

where can we find some benchmark code for it?

it would be great to compare CPU to GPU

bdice commented 6 months ago

Hi @joamatab and @flaport -- first, thank you for your time. I met with @joamatab at PyCon as part of the Accelerated Python sprint. We discussed using a CUDA-based backend for this library. CuPy seemed like the easiest choice.

Here's a brief rundown of what this PR contains:

The new "cuda" backend requires cupy but it is not set as the default, because it is not compatible with JAX JIT and thus cannot be used for optimization. I don't know how crucial JAX JIT / differentiable backends are for the problems you typically use here.

The "cuda" backend uses CuPy, which calls into cuSolver. The cupyx.scipy.sparse.linalg.spsolve function does not support batched sparse solves. It seems like this is a common use case in sax -- but the only solution I have for now is to use a raw for loop, which may not be ideal for performance. There may be future CUDA libraries that serve this use case with a fully-batched solver, which would be able to provide further acceleration on batches of smaller sparse matrices.

The main use case I would see for this backend is to enable sparse solves when you have a small number of very large sparse matrices. For a performance evaluation, I would try this with a very large sparse matrix. I wasn't able to find an example for benchmarking this in the repository, so I haven't pursued that any further.

I also ported some of the test/example notebooks into proper tests. Running the quick start notebook caught some errors in the CUDA backend, which were easily fixable but were not covered by the existing tests. I also expanded the tests to compare the CUDA and KLU backends for the sample data provided in a test notebook.

It was great to meet @joamatab and I hope this is helpful -- I won't be able to commit significantly more time here, except to address some PR reviews. If you try it out and see good (or bad) performance, please let me know! I am interested in seeing how it performs on large sparse matrices. Please feel free to give it a try. If you try it and find it's not worth adding for any reason, I won't be offended if you close the PR. It was fun to learn about this solver and the problems you're using it for!

Best wishes to you, and thanks for maintaining this as an open-source project!

flaport commented 6 months ago

Hi @bdice , Thank you so much for your contribution. Adding a CUDA backend has been something I wanted for a long time! I'm currently in a big move so I won't have much time to review this week, but rest assured that this is one of the first things on my todo list next week! I'll also add a benchmarking suite for future reference. I'm interested to see where it lands :)

flaport / sax

add cuda backend for KLU #35