CuSparseMatrix - CuMatrix multiplication not working: giving Scalar Indexing

lgravina1997 commented 1 year ago

Multiplying a CuSparseMatrixCSC with a CuArray gives Scalar indexing.

To reproduce:

    CUDA.allowscalar(false)
    A  = cu(sparse([1,2,3], [1,2,3], [1,2,3]))
    B  = cu(rand(3,1))
    C = A*B

or

    CUDA.allowscalar(false)
    A  = cu(sparse([1,2,3], [1,2,3], [1,2,3]))
    B  = cu(rand(3,1))
    C = similar(B)
    mul!(C, A, B)

Both give the same problem of course.

Version info

Details on Julia:

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
  Threads: 21 on 20 virtual cores
Environment:
  JULIA_NUM_THREADS = auto

CUDA runtime 12.1, artifact installation
CUDA driver 12.0
NVIDIA driver 525.125.6

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+525.125.6

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3070 (sm_86, 6.158 GiB [/](https://vscode-remote+ssh-002dremote-002b128-002e178-002e67-002e73.vscode-resource.vscode-cdn.net/) 8.000 GiB available)

amontoison commented 1 year ago

@dkarrasch Is it possible that you removed the associated dispatch with https://github.com/JuliaGPU/CUDA.jl/pull/1904? We should call this routine.

dkarrasch commented 1 year ago

I'm not sure. There's https://github.com/JuliaGPU/CUDA.jl/blob/c97bc77e4d1c8f36051079dbf12dd3c9bec75eb4/lib/cusparse/interfaces.jl#L73-L77

so we would need the stacktrace to see how dispatch goes and where it deviates from the expected path. It could be that I missed some VERSION-dependent branching, though.

dkarrasch commented 1 year ago

I played with it a little locally, but it seems like it should run by LinearAlgebra.generic_matmatmul! also on v1.9, for which we do have the above method, so we really need the stacktrace (for both calls) to see where it's leaving the right path. I can't test it locally, unfortunately.

amontoison commented 1 year ago

@dkarrasch @lgravina1997 I just remarked that sparse([1,2,3], [1,2,3], [1,2,3]) is a sparse matrix with integer coefficients. It's normal that the products give scalar indexing, it's not a "BlasFloat" type.

dkarrasch commented 1 year ago

True. So, to confirm, for float types everything works as expected @lgravina1997?

stmorgenstern commented 1 year ago

Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(I(N^3)))
B = cu(sparse(I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)

Stacktrace:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
 [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9
 [4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64)
   @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310
 [5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876
 [6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844
 [7] mul!
   @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [9] top-level scope
   @ REPL[8]:1

julia version info:

Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 36 on 32 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 32

CUDA version info

CUDA runtime 12.2, artifact installation
CUDA driver 12.1
NVIDIA driver 531.29.0

CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+531.29

Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+3
- CUDA_Runtime_jll: 0.9.2+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)

stmorgenstern commented 1 year ago

Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(I(N^3)))
B = cu(sparse(I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)

Stacktrace:

ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
 [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
   @ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9
 [4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64)
   @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310
 [5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876
 [6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844
 [7] mul!
   @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32})
   @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [9] top-level scope
   @ REPL[8]:1

julia version info:

Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 36 on 32 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 32

CUDA version info

CUDA runtime 12.2, artifact installation
CUDA driver 12.1
NVIDIA driver 531.29.0

CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+531.29

Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+3
- CUDA_Runtime_jll: 0.9.2+0

Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)

Also while looking into this, I noticed it might be possible that the mul! tests in the CUSPARSE tests are not catching the CuSparseMatrixCSC * CuSparseMatrixCSC case explicitly leading to this error slipping through the cracks

dkarrasch commented 1 year ago

Same error, same cause: https://github.com/JuliaGPU/CUDA.jl/issues/2072#issuecomment-1710634918

What happens if you turn your Is into 1.0Is?

stmorgenstern commented 1 year ago

Same error, same cause: #2072 (comment)

What happens if you turn your Is into 1.0Is?

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(1.0I(N^3)))
B = cu(sparse(1.0I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)

julia> mul!(C,A,B)
8000×8000 CuSparseMatrixCSC{Float32, Int32} with 8000 stored entries:
Error showing value of type CuSparseMatrixCSC{Float32, Int32}:
ERROR: ArgumentError: 1 == colptr[8000] > colptr[8001] == 0
Stacktrace:
  [1] (::SparseArrays.var"#throwmonotonic#3")(ckp::Int32, ck::Int32, k::Int64)
    @ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:141
  [2] sparse_check
    @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:148 [inlined]
  [3] SparseMatrixCSC(m::Int64, n::Int64, colptr::Vector{Int32}, rowval::Vector{Int32}, nzval::Vector{Float32})
    @ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:38
  [4] SparseMatrixCSC(x::CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:403
  [5] show(io::IOContext{Base.TTY}, mime::MIME{Symbol("text/plain")}, S::CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:540
  [6] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:276
  [7] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
  [8] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:262
  [9] display(d::REPL.REPLDisplay, x::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:281
 [10] display(x::Any)
    @ Base.Multimedia .\multimedia.jl:340
 [11] #invokelatest#2
    @ .\essentials.jl:816 [inlined]
 [12] invokelatest
    @ .\essentials.jl:813 [inlined]
 [13] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:305
 [14] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
 [15] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
 [16] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
 [17] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
 [18] (::VSCodeServer.var"#101#104"{REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
    @ VSCodeServer c:\Users\Sam\.vscode\extensions\julialang.language-julia-1.54.2\scripts\packages\VSCodeServer\src\repl.jl:122
 [19] #invokelatest#2
    @ .\essentials.jl:816 [inlined]
 [20] invokelatest
    @ .\essentials.jl:813 [inlined]
 [21] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
 [22] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
 [23] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL .\task.jl:514

Actually, I believe this is just a display error. The calculation seems to be fine if I'm reading it correctly.

dkarrasch commented 1 year ago

But this is just an error in the show method, mul! doesn't throw.

stmorgenstern commented 1 year ago

But this is just an error in the show method, mul! doesn't throw.

Yup you're right I read it too hastily. Its a bit strange. I ran a couple other tests closer to what I'm using in my actual jacobian calc and adding the 1.0 in front seems to fix things at least at first glance?

amontoison commented 1 year ago

A, B and C must have the same type. I don't understand why the result is CuSparseMatrixCSC{Float32, Int32}. It should be a double precision sparse matrix. Can you check that all your matrices are CuSparseMatrixCSC{Float64,Int32}?

dkarrasch commented 1 year ago

I'd guess so. The specialized mul! methods are restricted to BlasFloat eltypes, and otherwise fall back to something else: GPUArrays.jl, LinearAlgebra, SparseArrays, whatever catches it.

stmorgenstern commented 1 year ago

Here's the type check from the example with 1.0I

julia> typeof(C),typeof(A),typeof(B)
(CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32})

amontoison commented 1 year ago

Do you have the same error with the CuSparseMatrixCSR format? What is the version of your CUDA toolkit?

stmorgenstern commented 1 year ago

Multiplication seems to be fine with CuSparseMatrixCSR format with my related case. My CUDA toolkit is version 12.1 (full version info is above). However, I went back and ran the original code from this issue and found it was creating some very weird behavior:

using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
CUDA.allowscalar(false)
A  = cu(sparse([1.,2.,3], [1.,2.,3.], [1.,2.,3.]))
B  = cu(rand(3,1))
C = similar(B)

typeof(A),typeof(B),typeof(C) #(CuSparseMatrixCSC{Float32, Int32}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})

 mul!(C, A, B)
ERROR: Out of GPU memory trying to allocate 127.995 TiB
Effective GPU memory usage: 10.99% (1.318 GiB/11.994 GiB)
Memory pool usage: 64 bytes (32.000 MiB reserved)

Stacktrace:
  [1] macro expansion
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:443 [inlined]
  [2] macro expansion
    @ .\timing.jl:393 [inlined]
  [3] #_alloc#996
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:431 [inlined]
  [4] _alloc
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:427 [inlined]
  [5] #alloc#995
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:417 [inlined]
  [6] alloc
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:411 [inlined]
  [7] CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64})
    @ CUDA C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:74
  [8] CuArray
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:136 [inlined]
  [9] CuArray
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:149 [inlined]
 [10] with_workspace(f::CUDA.CUSPARSE.var"#1340#1342"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, eltyp::Type{UInt8}, size::CUDA.CUSPARSE.var"#bufferSize#1341"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:67
 [11] with_workspace
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:58 [inlined]
 [12] #with_workspace#1
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
 [13] with_workspace (repeats 2 times)
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
 [14] mm!(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, index::Char, algo::CUDA.CUSPARSE.cusparseSpMMAlg_t)
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:237
 [15] mm!
    @ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:197 [inlined]
 [16] mm_wrapper(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:46
 [17] generic_matmatmul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:76
 [18] mul!
    @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
 [19] mul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
 [20] top-level scope
    @ REPL[8]:1

JuliaGPU / CUDA.jl

CuSparseMatrix - CuMatrix multiplication not working: giving Scalar Indexing #2072