exanauts / CUDSS.jl

MIT License
18 stars 1 forks source link

Package management and type piracy: should this be a part of CUDA.jl? #31

Open ChrisRackauckas opened 5 months ago

ChrisRackauckas commented 5 months ago

There are a few issues with the package given its separation from CUDA.jl:

  1. https://github.com/exanauts/CUDSS.jl/blob/v0.1.3/src/generic.jl#L15 these linear algebra overloads are type piracy. Also, it's kind of odd that CuSPARSE lu doesn't "just work", when this fixes it to work more like a regular sparse array. I would think users would be better served if there wasn't a special package to add on the side that fixed missing linear algebra functions.
  2. The ergonomics of having it separate are also hard to deal with. For example, with LinearSolve.jl I'd like to make it so that if a user uses a sparse matrix then the extension should allow for a sparse LU factorization. But LinearSolve's CUDA.jl extension cannot define the code for making sparse LUs work, because it needs this package, but the user only needs CUDA.jl for the sparse matrix, and they are given no indication that another package is needed. What I can do is add an error message that throws an error in the LinearSolve CUDA extension saying that they also need to add CUDSS.jl, but given that packages are now required to correct for issues that the package manager should be able to identify, that's a bit of a red flag that something is going on here.

So, since this is just the sparse factorization of a sparse matrix defined in CUDA.jl, it's very odd that this one function is not in CUDA.jl.

CC @maleadt

amontoison commented 5 months ago

@ChrisRackauckas

  1. lu, cholesky and ldlt of a CuSparseMatrixCSR are not defined in CUDA.jl. We don't have any sparse factorization in CUSPARSE. CUSPARSE only provides sparse matrix products and sparse triangular solves.

  2. cuDSS is not part of the CUDA toolkit. It's still a standalone package in preview. I did a package CUDSS_jll.jl because NVIDIA doesn't provide the library like the other CUDA libraries. cuDSS has a different release system than CUDA toolkit and NVIDIA can break the API at each release. It will be harder to maintain in CUDA.jl because we can't do new release when we want. What not just add CUDSS.jl as a dependency of LinearSolve.jl? It will solve all your issues.

ChrisRackauckas commented 5 months ago
  1. Yes, that is the definition of type piracy and what the whole issue is about.
  2. "What not just add CUDSS.jl as a dependency of LinearSolve.jl?" Not even CUDA is a dependency of LinearSolve.jl. Obviously it makes no sense to make CUDSS a dependency when CUDA isn't even a dependency, and when most people don't have a CUDA-compatible card. It will not solve all of the issues, it introduces many many more. Giving hundreds of thousands of people an extra JLL when most don't have a GPU isn't an option, but having it with whatever ships the CUDA sparse array is.

I did a package CUDSS_jll.jl because NVIDIA doesn't provide the library like the other CUDA libraries.

Yes, a separate JLL makes sense. But the question is whether a fundamental type piracy can be fixed. Its fundamental because users who define CuSparseMatrixCSR are missing a core action, but there is no error message or warning from CUDA.jl that to fix part of the linear algebra interface to allow the user to know what's going on. Since CUDA.jl defines the rest of the linear algebra interface, leaving off this one function causes downstream usage issues. It's a red flag that clearly should get resolved somehow. Maybe for now it's fine as a separate package while CUDSS is still being developed but why wouldn't it it be a part of CUDA's SparseArray dispatches by default in the long run? Or if not, what's the plan to fix the type piracy?

amontoison commented 5 months ago
  1. "What not just add CUDSS.jl as a dependency of LinearSolve.jl?" Not even CUDA is a dependency of LinearSolve.jl. Obviously it makes no sense to make CUDSS a dependency when CUDA isn't even a dependency, and when most people don't have a CUDA-compatible card.

I was talking about the LinearSolve's CUDA.jl extension.

In the end, it depends on what NVIDIA plans to do with cuDSS. If it's merged with CUSOLVER and added as a standalone package in CUDA, we should move this interface to CUDA.jl. Note that these factorizations still have bugs, and we frequently interact with NVIDIA about them. I don't think it should be used by everyone without caution.

If we don't have "official" sparse factorizations in CUDA, there isn't much we can do in the linear algebra interface of CUDA.jl. Implementing a generic GPU fallback for these cases would be challenging. We also lack a generic CPU implementation in the Julia module SparseArrays...

maleadt commented 5 months ago

CUDA.jl is already way to heavyweight, I don't think adding more libraries is a good way forwards. In fact, we've been doing the inverse, moving stuff like cuDNN and cuTENSOR out and restricting CUDA.jl to only what the CUDA toolkit provides.

ChrisRackauckas commented 5 months ago

What about a separate CUDASparseArrays that is complete with the sparse arrays defintions and solvers? That would make it like SparseArrays which as SuiteSparse to match it?

I was talking about the LinearSolve's CUDA.jl extension.

Extensions cannot have dependencies in Julia v1.10. And if it was added as a requirement to the extension that would break a lot of existing code.