Open lgravina1997 opened 1 year ago
@dkarrasch Is it possible that you removed the associated dispatch with https://github.com/JuliaGPU/CUDA.jl/pull/1904? We should call this routine.
I'm not sure. There's https://github.com/JuliaGPU/CUDA.jl/blob/c97bc77e4d1c8f36051079dbf12dd3c9bec75eb4/lib/cusparse/interfaces.jl#L73-L77
so we would need the stacktrace to see how dispatch goes and where it deviates from the expected path. It could be that I missed some VERSION-dependent branching, though.
I played with it a little locally, but it seems like it should run by LinearAlgebra.generic_matmatmul!
also on v1.9, for which we do have the above method, so we really need the stacktrace (for both calls) to see where it's leaving the right path. I can't test it locally, unfortunately.
@dkarrasch @lgravina1997
I just remarked that sparse([1,2,3], [1,2,3], [1,2,3])
is a sparse matrix with integer coefficients.
It's normal that the products give scalar indexing, it's not a "BlasFloat" type.
True. So, to confirm, for float types everything works as expected @lgravina1997?
Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:
using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(I(N^3)))
B = cu(sparse(I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)
Stacktrace:
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] assertscalar(op::String)
@ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103
[3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64)
@ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9
[4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64)
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310
[5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876
[6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844
[7] mul!
@ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
[8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32})
@ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
[9] top-level scope
@ REPL[8]:1
julia version info:
Julia Version 1.9.2
Commit e4ee485e90 (2023-07-05 09:39 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
Threads: 36 on 32 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 32
CUDA version info
CUDA runtime 12.2, artifact installation
CUDA driver 12.1
NVIDIA driver 531.29.0
CUDA libraries:
- CUBLAS: 12.2.5
- CURAND: 10.3.3
- CUFFT: 11.0.8
- CUSOLVER: 11.5.2
- CUSPARSE: 12.1.2
- CUPTI: 20.0.0
- NVML: 12.0.0+531.29
Julia packages:
- CUDA: 5.0.0
- CUDA_Driver_jll: 0.6.0+3
- CUDA_Runtime_jll: 0.9.2+0
Toolchain:
- Julia: 1.9.2
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)
Running into this same error currently, while trying to speed up some expensive jacobian calculations. Here's my MWE and full stack trace:
using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra N = 20 CUDA.allowscalar(false) A = cu(sparse(I(N^3))) B = cu(sparse(I(N^3))) C = cu(spzeros(N^3,N^3)) mul!(C,A,B)
Stacktrace:
ERROR: Scalar indexing is disallowed. Invocation of getindex resulted in scalar indexing of a GPU array. This is typically caused by calling an iterating implementation of a method. Such implementations *do not* execute on the GPU, but very slowly on the CPU, and therefore are only permitted from the REPL for prototyping purposes. If you did intend to index this array, annotate the caller with @allowscalar. Stacktrace: [1] error(s::String) @ Base .\error.jl:35 [2] assertscalar(op::String) @ GPUArraysCore C:\Users\Sam\.julia\packages\GPUArraysCore\uOYfN\src\GPUArraysCore.jl:103 [3] getindex(xs::CuArray{Int32, 1, CUDA.Mem.DeviceBuffer}, I::Int64) @ GPUArrays C:\Users\Sam\.julia\packages\GPUArrays\EZkix\src\host\indexing.jl:9 [4] getindex(A::CuSparseMatrixCSC{Bool, Int32}, i0::Int64, i1::Int64) @ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:310 [5] _generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool}) @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:876 [6] generic_matmatmul!(C::CuSparseMatrixCSC{Float32, Int32}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool}) @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:844 [7] mul! @ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined] [8] mul!(C::CuSparseMatrixCSC{Float32, Int32}, A::CuSparseMatrixCSC{Bool, Int32}, B::CuSparseMatrixCSC{Bool, Int32}) @ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276 [9] top-level scope @ REPL[8]:1
julia version info:
Julia Version 1.9.2 Commit e4ee485e90 (2023-07-05 09:39 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: 32 × AMD Ryzen 9 7950X 16-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.6 (ORCJIT, znver3) Threads: 36 on 32 virtual cores Environment: JULIA_EDITOR = code JULIA_NUM_THREADS = 32
CUDA version info
CUDA runtime 12.2, artifact installation CUDA driver 12.1 NVIDIA driver 531.29.0 CUDA libraries: - CUBLAS: 12.2.5 - CURAND: 10.3.3 - CUFFT: 11.0.8 - CUSOLVER: 11.5.2 - CUSPARSE: 12.1.2 - CUPTI: 20.0.0 - NVML: 12.0.0+531.29 Julia packages: - CUDA: 5.0.0 - CUDA_Driver_jll: 0.6.0+3 - CUDA_Runtime_jll: 0.9.2+0 Toolchain: - Julia: 1.9.2 - LLVM: 14.0.6 - PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5 - Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86 1 device: 0: NVIDIA GeForce RTX 4070 Ti (sm_89, 9.035 GiB / 11.994 GiB available)
Also while looking into this, I noticed it might be possible that the mul! tests in the CUSPARSE tests are not catching the CuSparseMatrixCSC * CuSparseMatrixCSC case explicitly leading to this error slipping through the cracks
Same error, same cause: https://github.com/JuliaGPU/CUDA.jl/issues/2072#issuecomment-1710634918
What happens if you turn your I
s into 1.0I
s?
Same error, same cause: #2072 (comment)
What happens if you turn your
I
s into1.0I
s?
using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
N = 20
CUDA.allowscalar(false)
A = cu(sparse(1.0I(N^3)))
B = cu(sparse(1.0I(N^3)))
C = cu(spzeros(N^3,N^3))
mul!(C,A,B)
julia> mul!(C,A,B)
8000×8000 CuSparseMatrixCSC{Float32, Int32} with 8000 stored entries:
Error showing value of type CuSparseMatrixCSC{Float32, Int32}:
ERROR: ArgumentError: 1 == colptr[8000] > colptr[8001] == 0
Stacktrace:
[1] (::SparseArrays.var"#throwmonotonic#3")(ckp::Int32, ck::Int32, k::Int64)
@ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:141
[2] sparse_check
@ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:148 [inlined]
[3] SparseMatrixCSC(m::Int64, n::Int64, colptr::Vector{Int32}, rowval::Vector{Int32}, nzval::Vector{Float32})
@ SparseArrays C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\SparseArrays\src\sparsematrix.jl:38
[4] SparseMatrixCSC(x::CuSparseMatrixCSC{Float32, Int32})
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:403
[5] show(io::IOContext{Base.TTY}, mime::MIME{Symbol("text/plain")}, S::CuSparseMatrixCSC{Float32, Int32})
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\array.jl:540
[6] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:276
[7] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
[8] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:262
[9] display(d::REPL.REPLDisplay, x::Any)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:281
[10] display(x::Any)
@ Base.Multimedia .\multimedia.jl:340
[11] #invokelatest#2
@ .\essentials.jl:816 [inlined]
[12] invokelatest
@ .\essentials.jl:813 [inlined]
[13] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:305
[14] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:287
[15] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:557
[16] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:285
[17] (::REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:899
[18] (::VSCodeServer.var"#101#104"{REPL.var"#do_respond#80"{Bool, Bool, REPL.var"#93#103"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
@ VSCodeServer c:\Users\Sam\.vscode\extensions\julialang.language-julia-1.54.2\scripts\packages\VSCodeServer\src\repl.jl:122
[19] #invokelatest#2
@ .\essentials.jl:816 [inlined]
[20] invokelatest
@ .\essentials.jl:813 [inlined]
[21] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\LineEdit.jl:2647
[22] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\REPL\src\REPL.jl:1300
[23] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL .\task.jl:514
Actually, I believe this is just a display error. The calculation seems to be fine if I'm reading it correctly.
But this is just an error in the show
method, mul!
doesn't throw.
But this is just an error in the
show
method,mul!
doesn't throw.
Yup you're right I read it too hastily. Its a bit strange. I ran a couple other tests closer to what I'm using in my actual jacobian calc and adding the 1.0 in front seems to fix things at least at first glance?
A, B and C must have the same type. I don't understand why the result is CuSparseMatrixCSC{Float32, Int32}
. It should be a double precision sparse matrix.
Can you check that all your matrices are CuSparseMatrixCSC{Float64,Int32}
?
I'd guess so. The specialized mul!
methods are restricted to BlasFloat
eltypes, and otherwise fall back to something else: GPUArrays.jl
, LinearAlgebra
, SparseArrays
, whatever catches it.
Here's the type check from the example with 1.0I
julia> typeof(C),typeof(A),typeof(B)
(CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32}, CuSparseMatrixCSC{Float32, Int32})
Do you have the same error with the CuSparseMatrixCSR
format?
What is the version of your CUDA toolkit?
Multiplication seems to be fine with CuSparseMatrixCSR format with my related case. My CUDA toolkit is version 12.1 (full version info is above). However, I went back and ran the original code from this issue and found it was creating some very weird behavior:
using CUDA,CUDA.CUSPARSE,SparseArrays,LinearAlgebra
CUDA.allowscalar(false)
A = cu(sparse([1.,2.,3], [1.,2.,3.], [1.,2.,3.]))
B = cu(rand(3,1))
C = similar(B)
typeof(A),typeof(B),typeof(C) #(CuSparseMatrixCSC{Float32, Int32}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
mul!(C, A, B)
ERROR: Out of GPU memory trying to allocate 127.995 TiB
Effective GPU memory usage: 10.99% (1.318 GiB/11.994 GiB)
Memory pool usage: 64 bytes (32.000 MiB reserved)
Stacktrace:
[1] macro expansion
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:443 [inlined]
[2] macro expansion
@ .\timing.jl:393 [inlined]
[3] #_alloc#996
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:431 [inlined]
[4] _alloc
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:427 [inlined]
[5] #alloc#995
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:417 [inlined]
[6] alloc
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\pool.jl:411 [inlined]
[7] CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64})
@ CUDA C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:74
[8] CuArray
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:136 [inlined]
[9] CuArray
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\src\array.jl:149 [inlined]
[10] with_workspace(f::CUDA.CUSPARSE.var"#1340#1342"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, eltyp::Type{UInt8}, size::CUDA.CUSPARSE.var"#bufferSize#1341"{Float32, Char, Bool, Bool, CUDA.CUSPARSE.cusparseSpMMAlg_t, CUDA.CUSPARSE.CuDenseMatrixDescriptor, CUDA.CUSPARSE.CuDenseMatrixDescriptor}, fallback::Nothing; keep::Bool)
@ CUDA.APIUtils C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:67
[11] with_workspace
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:58 [inlined]
[12] #with_workspace#1
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
[13] with_workspace (repeats 2 times)
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\utils\call.jl:55 [inlined]
[14] mm!(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, index::Char, algo::CUDA.CUSPARSE.cusparseSpMMAlg_t)
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:237
[15] mm!
@ C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\generic.jl:197 [inlined]
[16] mm_wrapper(transa::Char, transb::Char, alpha::Bool, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:46
[17] generic_matmatmul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, tA::Char, tB::Char, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ CUDA.CUSPARSE C:\Users\Sam\.julia\packages\CUDA\nbRJk\lib\cusparse\interfaces.jl:76
[18] mul!
@ C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:303 [inlined]
[19] mul!(C::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, A::CuSparseMatrixCSC{Float32, Int32}, B::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra C:\Users\Sam\AppData\Local\Programs\julia-1.9.2\share\julia\stdlib\v1.9\LinearAlgebra\src\matmul.jl:276
[20] top-level scope
@ REPL[8]:1
Multiplying a CuSparseMatrixCSC with a CuArray gives Scalar indexing.
To reproduce:
or
Both give the same problem of course.
Version info
Details on Julia: