Jutho / KrylovKit.jl

Krylov methods for linear problems, eigenvalues, singular values and matrix functions
Other
284 stars 37 forks source link

GPU Issue #35

Closed rveltz closed 1 year ago

rveltz commented 4 years ago

Hi,

I am still on KK0.4.2. Somehow, this used to work but not anymore

using KrylovKit
using CuArrays # version 0.2.2
CuArrays.allowscalar(false)
import LinearAlgebra: mul!, axpby!
mul!(x::CuArray, y::CuArray, α::T) where {T <: Number} = (x .= α .* y)
mul!(x::CuArray, y::T, α::CuArray) where {T <: Number} = (x .= α .* y)
axpby!(a::T, X::CuArray, b::T, Y::CuArray) where {T <: Number} = (Y .= a .* X .+ b .* Y)

L = x->x

vals, vec, info = KrylovKit.eigsolve(L, CuArray{Float64}(rand(Float64, 1024)), 10, :LM, tol = 1e-10, maxiter = 40, verbosity = 2, issymmetric = true, krylovdim = max(40, 10 + 10))

It returns the following error. I'd say the IndexStyle is off. Do you have an idea what's wrong?

Thank you,

Best

Romain

julia> vals, vec, info = KrylovKit.eigsolve(L, CuArray{Float64}(rand(Float64, 1024)), 10, :LM, tol = 1e-10, maxiter = 40, verbosity = 2, issymmetric = true, krylovdim = max(40, 10 + 10))
[ Info: Lanczos eigsolve in iter 1: 10 values converged, normres = (6.31e-17, 3.49e-17, 2.82e-19, 8.34e-17, 9.53e-17, 9.82e-17, 9.20e-17, 7.74e-17, 5.59e-17, 2.93e-17)
ERROR: TaskFailedException:
scalar getindex is disallowed
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] assertscalar(::String) at /user/rveltz/home/.julia/packages/GPUArrays/JqOUg/src/host/indexing.jl:41
 [3] getindex(::CuArray{Float64,1,Nothing}, ::Int64) at /user/rveltz/home/.julia/packages/GPUArrays/JqOUg/src/host/indexing.jl:96
 [4] macro expansion at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:125 [inlined]
 [5] macro expansion at ./simdloop.jl:77 [inlined]
 [6] unproject_linear_kernel!(::CuArray{Float64,1,Nothing}, ::KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}}, ::SubArray{Float64,1,Array{Float64,2},Tuple{UnitRange{Int64},Int64},true}, ::UnitRange{Int64}, ::Int64, ::Int64, ::Base.OneTo{Int64}) at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:124
 [7] macro expansion at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:116 [inlined]
 [8] (::KrylovKit.var"#36#threadsfor_fun#11"{Int64,Int64,Base.OneTo{Int64},Int64,CuArray{Float64,1,Nothing},SubArray{Float64,1,Array{Float64,2},Tuple{UnitRange{Int64},Int64},true},KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}},Int64,StepRange{Int64,Int64}})(::Bool) at ./threadingconstructs.jl:61
 [9] (::KrylovKit.var"#36#threadsfor_fun#11"{Int64,Int64,Base.OneTo{Int64},Int64,CuArray{Float64,1,Nothing},SubArray{Float64,1,Array{Float64,2},Tuple{UnitRange{Int64},Int64},true},KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}},Int64,StepRange{Int64,Int64}})() at ./threadingconstructs.jl:28
Stacktrace:
 [1] wait(::Task) at ./task.jl:251
 [2] macro expansion at ./threadingconstructs.jl:69 [inlined]
 [3] unproject_linear_multithreaded!(::CuArray{Float64,1,Nothing}, ::KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}}, ::SubArray{Float64,1,Array{Float64,2},Tuple{UnitRange{Int64},Int64},true}, ::Int64, ::Int64, ::Base.OneTo{Int64}) at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:115
 [4] unproject!(::CuArray{Float64,1,Nothing}, ::KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}}, ::SubArray{Float64,1,Array{Float64,2},Tuple{UnitRange{Int64},Int64},true}, ::Int64, ::Int64, ::Base.OneTo{Int64}) at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:89
 [5] unproject! at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:88 [inlined]
 [6] mul! at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:58 [inlined]
 [7] * at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/orthonormal.jl:56 [inlined]
 [8] #44 at ./none:0 [inlined]
 [9] iterate at ./generator.jl:47 [inlined]
 [10] collect(::Base.Generator{KrylovKit.ColumnIterator{SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false},Base.OneTo{Int64}},KrylovKit.var"#44#47"{KrylovKit.OrthonormalBasis{CuArray{Float64,1,Nothing}}}}) at ./array.jl:622
 [11] eigsolve(::Function, ::CuArray{Float64,1,Nothing}, ::Int64, ::Symbol, ::Lanczos{ModifiedGramSchmidt2,Float64}) at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/eigsolve/lanczos.jl:146
 [12] #eigsolve#41(::Base.Iterators.Pairs{Symbol,Real,NTuple{5,Symbol},NamedTuple{(:tol, :maxiter, :verbosity, :issymmetric, :krylovdim),Tuple{Float64,Int64,Int64,Bool,Int64}}}, ::typeof(eigsolve), ::Function, ::CuArray{Float64,1,Nothing}, ::Int64, ::Symbol) at /user/rveltz/home/.julia/packages/KrylovKit/oJ59b/src/eigsolve/eigsolve.jl:163
 [13] (::KrylovKit.var"#kw##eigsolve")(::NamedTuple{(:tol, :maxiter, :verbosity, :issymmetric, :krylovdim),Tuple{Float64,Int64,Int64,Bool,Int64}}, ::typeof(eigsolve), ::Function, ::CuArray{Float64,1,Nothing}, ::Int64, ::Symbol) at ./none:0
 [14] top-level scope at none:0
Jutho commented 2 years ago

My apologies for not responding sooner to this, I must have completely forgotten about it. I assume this is still an issue. It is caused by my logic for when to use my own multithreaded implementation of a particular operation (namely forming a linear combination of vectors). Basically, if the vectors are actual AbstractArray subtypes, I try to apply a multithreaded implementation with explicit for loops and indexing, instead of using the minimal vector interface. Indeed, I currently have this logic

    if y isa AbstractArray && IndexStyle(y) isa IndexLinear && Threads.nthreads() > 1
        return unproject_linear_multithreaded!(y, b, x, α, β, r)
    end

Clearly, I want to exclude CuArrays from this. However, I now wonder how I could do that, without actually having to depend on CUDA.jl in the first place.

rveltz commented 2 years ago

It's been a while I had forgotten about this. I just tested with CUDA and KK0.5.3, similar problem.

It is a serious problem because you basically have the only iterative eigensolver which works for non hermitian op and on the GPU...

Why dont you use a call like

function linsolve(..., use_multi_thread = y isa AbstractArray && IndexStyle(y) isa IndexLinear && Threads.nthreads() > 1)
...
if use_multi_thread
 return unproject_linear_multithreaded!(y, b, x, α, β, r)
else
blabla
end
end

that way, we can manually override this behaviour

Jutho commented 2 years ago

Note, by the nature of the if clause, that you will not run into this problem if you have a single threaded Julia instance. But that may not be an acceptable work around.

rveltz commented 2 years ago

sure! that's why I did not bother you earlier! My research rests a lot on eigsolve

Jutho commented 2 years ago

Thanks for the suggestion. That would not be entirely straightforward to implement, as that keyword argument would need to penetrate from the high-level methods such as eigsolve and linsolve to a low-level method. What would perhaps be useful is to have a package-wide flag to controle whether or not KrylovKit should attempt to to use multithreading, similar to what I have in Strided.jl

MasonProtter commented 2 years ago

I'm also running into this problem, it'd be great if there could be a way for this to work nicely on the GPU.

@rveltz My current workaround is to write

local A::CuSparseMatrixCSC
v0 = rand(N)
vgpu = cu(v0)
eigsolve(v0, args...) do v
    copyto!(vgpu, v)
    collect(A * vgpu)
end

where A is a sparse GPU matrix. The idea is just to give KrylovKit a regular vector to work with and then every time it does an actual iteration, copy that data to the GPU, do GPU operations on it, and then return a regular array at the end.

This encapsulates all the GPU elements so KrylovKit never needs to even know you're doing GPU operations at all.

Jutho commented 1 year ago

My apologies for the long silence and delay. This should just have been fixed thanks to https://github.com/Jutho/KrylovKit.jl/pull/61 , which will be available in the upcoming KrylovKit.jl 0.6. I hope to also have a version 0.7 soonish, featuring AD support for linsolve and eigsolve.