JuliaSmoothOptimizers / Krylov.jl

A Julia Basket of Hand-Picked Krylov Methods
Other
349 stars 51 forks source link

GPU scalar indexing when using LsmrSolver (ERROR: LoadError: Scalar indexing is disallowed.) #764

Closed behinger closed 1 year ago

behinger commented 1 year ago

Hey! we (@jschepers) and myself stumbled over an ERROR: LoadError: Scalar indexing is disallowed. error when accessing the output of a solver.

Easiest visible in this MWE:

using Random, LinearAlgebra, CUDA, Krylov

Random.seed!(0)
n = 64
A = Matrix(Symmetric(rand(n, n)))
cA = CuArray(A)

b = ones(n)
cb = CuArray(b)

CUDA.allowscalar(false) # to get an error, not a warning

out = ones(n,2)
cx, cstats = lsmr(cA, cb);
out[:,1] = cx # this works

lsmr_solver = Krylov.LsmrSolver(size(cA)..., CuVector{Float64})
Krylov.solve!(lsmr_solver, cA, cb; history=true)

out[:,2] = lsmr_solver.x # this throws an error
out[:,2] = Vector(lsmr_solver.x) # this works

Why does accessing the CuArray lsmr_solver.x directly result in such an error in the second case, but not when being outputted as cx in the first example via cx,cstat = lsmr(cA,cB)?

Cheers, Judith & Benedikt

amontoison commented 1 year ago

Hi @behinger!

I have an error in both cases here:

julia> out[:,1] = cx # this works
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/HaQcr/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/6STCb/src/host/indexing.jl:9 [inlined]
  [4] iterate
    @ ./abstractarray.jl:1167 [inlined]
  [5] iterate
    @ ./abstractarray.jl:1165 [inlined]
  [6] _unsafe_setindex!(::IndexLinear, ::Matrix{Float64}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
    @ Base ./multidimensional.jl:940
  [7] _setindex!
    @ ./multidimensional.jl:930 [inlined]
  [8] setindex!(::Matrix{Float64}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, ::Function, ::Int64)
    @ Base ./abstractarray.jl:1344
  [9] top-level scope
    @ REPL[11]:1
 [10] top-level scope
    @ ~/git/CUDA.jl/src/initialization.jl:163

julia> lsmr_solver = Krylov.LsmrSolver(size(cA)..., CuVector{Float64})
┌──────────────────┬───────────────────┬──────────────────┐
│        LsmrSolver│          nrows: 64│         ncols: 64│
├──────────────────┼───────────────────┼──────────────────┤
│Precision: Float64│  Architecture: GPU│Storage: 3.618 KiB│
├──────────────────┼───────────────────┼──────────────────┤
│         Attribute│               Type│              Size│
├──────────────────┼───────────────────┼──────────────────┤
│                 m│              Int64│           8 bytes│
│                 n│              Int64│           8 bytes│
│                 x│CuArray{Float64, 1}│         512 bytes│
│                Nv│CuArray{Float64, 1}│         512 bytes│
│               Aᴴu│CuArray{Float64, 1}│         512 bytes│
│                 h│CuArray{Float64, 1}│         512 bytes│
│              hbar│CuArray{Float64, 1}│         512 bytes│
│                Mu│CuArray{Float64, 1}│         512 bytes│
│                Av│CuArray{Float64, 1}│         512 bytes│
│           err_vec│    Vector{Float64}│          40 bytes│
│             stats│ LsmrStats{Float64}│          65 bytes│
└──────────────────┴───────────────────┴──────────────────┘

LsmrStats
 niter: 0
 solved: false
 inconsistent: false
 residuals: []
 Aresiduals: []
 residual: 0.0
 Aresidual: 0.0
 κ₂(A): 0.0
 ‖A‖F: 0.0
 xNorm: 0.0
 timer: 0.00μs
 status: unknown

julia> Krylov.solve!(lsmr_solver, cA, cb; history=true)
┌──────────────────┬───────────────────┬──────────────────┐
│        LsmrSolver│          nrows: 64│         ncols: 64│
├──────────────────┼───────────────────┼──────────────────┤
│Precision: Float64│  Architecture: GPU│Storage: 5.240 KiB│
├──────────────────┼───────────────────┼──────────────────┤
│         Attribute│               Type│              Size│
├──────────────────┼───────────────────┼──────────────────┤
│                 m│              Int64│           8 bytes│
│                 n│              Int64│           8 bytes│
│                 x│CuArray{Float64, 1}│         512 bytes│
│                Nv│CuArray{Float64, 1}│         512 bytes│
│               Aᴴu│CuArray{Float64, 1}│         512 bytes│
│                 h│CuArray{Float64, 1}│         512 bytes│
│              hbar│CuArray{Float64, 1}│         512 bytes│
│                Mu│CuArray{Float64, 1}│         512 bytes│
│                Av│CuArray{Float64, 1}│         512 bytes│
│           err_vec│    Vector{Float64}│          40 bytes│
│             stats│ LsmrStats{Float64}│         1.686 KiB│
└──────────────────┴───────────────────┴──────────────────┘

LsmrStats
 niter: 101
 solved: true
 inconsistent: true
 residuals: [ 8.0e+00  6.8e-01  4.5e-01 ...  2.0e-01  2.0e-01  2.0e-01 ]
 Aresiduals: [ 2.5e+02  1.6e+00  7.0e-01 ...  1.4e-04  1.4e-04  1.4e-04 ]
 residual: 0.19902642812751492
 Aresidual: 0.00013911918604509325
 κ₂(A): 32937.386487626245
 ‖A‖F: 125.32881873355007
 xNorm: 1.4031718348848208
 timer: 93.31ms
 status: truncated forward error small enough

julia> out[:,2] = lsmr_solver.x # this throws an error
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/HaQcr/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/6STCb/src/host/indexing.jl:9 [inlined]
  [4] iterate
    @ ./abstractarray.jl:1167 [inlined]
  [5] iterate
    @ ./abstractarray.jl:1165 [inlined]
  [6] _unsafe_setindex!(::IndexLinear, ::Matrix{Float64}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
    @ Base ./multidimensional.jl:940
  [7] _setindex!
    @ ./multidimensional.jl:930 [inlined]
  [8] setindex!(::Matrix{Float64}, ::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, ::Function, ::Int64)
    @ Base ./abstractarray.jl:1344
  [9] top-level scope
    @ REPL[14]:1
 [10] top-level scope
    @ ~/git/CUDA.jl/src/initialization.jl:163
behinger commented 1 year ago

well, that is a surprise and much more saner than what we got. Indeed, we just now tried on a fresh REPL and we got the same result. Maybe we had a weird out-of-order problem, but we thought we checked on a fresh repl. Sorry & thanks!

It now makes much more sense to us also, CUDA warns us, that moving a CuArray into a normal Array is an operation moving memory from GPU to CPU, which you dont want to do typically.

PS: The GPU Krylov moved our fit from 75s to 2s, so that is "nice". Thanks a thousand times!!

amontoison commented 1 year ago

PS: The GPU Krylov moved our fit from 75s to 2s, so that is "nice". Thanks a thousand times!!

Great!!! May I ask you the field of your application and the size of the least-squares problem? It's always a pleasure to have this kind of feedbacks on my thesis project. 🙂

behinger commented 1 year ago

yeah sure! It is not yet there, but will be in https://github.com/unfoldtoolbox/unfold.jl/ (currently switching everything to extensions). The problems we solve are typically ~3 million rows x 10-40k columns, sparsity around 0.1%-1%, with a scaled toeplitz/diagnoal kind of summation/concatination we typically have 20-30 subjects with each 30-120 electrodes, for each we solve the same problem (PS: I don't think lsmr allows for hot-start, but maybe I simply missed it)

PPS: I dont use preconditioner right now because I dont have any background in that field :-D