JuliaGPU / GPUArrays.jl

Reusable array functionality for Julia's various GPU backends.
MIT License
321 stars 79 forks source link

Vectorized getindex of wrapped array falls back to scalar indexing #509

Open nomadbl opened 9 months ago

nomadbl commented 9 months ago

I wasn't sure how much this issue overlaps with existing ones, so apologies if this is a duplicate. I constantly get into the aforementioned scalar indexing issue when working with transpose. I opened a new PR (https://github.com/JuliaLang/julia/pull/52626) on base Julia intended to fix the problem, so I wanted to raise awareness and get feedback from the experts here as well.

copying the MWE here to demonstrate the error (which is avoided after the PR changes):

julia> using LinearAlgebra, JLArrays
julia> JLArrays.allowscalar(false)
julia> x = rand(Float32, 2,2) |> jl ; y = transpose(x);
julia> y[:,1]
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:155
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:128
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:116
  [5] getindex
    @ ~/.julia/packages/GPUArrays/EoKy0/src/host/indexing.jl:48 [inlined]
  [6] scalar_getindex
    @ ~/.julia/packages/GPUArrays/EoKy0/src/host/indexing.jl:34 [inlined]
  [7] _getindex
    @ ~/.julia/packages/GPUArrays/EoKy0/src/host/indexing.jl:17 [inlined]
  [8] getindex
    @ ~/.julia/packages/GPUArrays/EoKy0/src/host/indexing.jl:15 [inlined]
  [9] getindex
    @ ~/julia/usr/share/julia/stdlib/v1.11/LinearAlgebra/src/adjtrans.jl:334 [inlined]
 [10] macro expansion
    @ ./multidimensional.jl:921 [inlined]
 [11] macro expansion
    @ ./cartesian.jl:64 [inlined]
 [12] _unsafe_getindex!
    @ ./multidimensional.jl:916 [inlined]
 [13] _unsafe_getindex(::IndexCartesian, ::Transpose{Float32, JLArray{Float32, 2}}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64)
    @ Base ./multidimensional.jl:907
 [14] _getindex
    @ Base ./multidimensional.jl:893 [inlined]
 [15] getindex(::Transpose{Float32, JLArray{Float32, 2}}, ::Function, ::Int64)
    @ Base ./abstractarray.jl:1314
 [16] top-level scope
    @ REPL[10]:1

In particular, I want to hear if there's something I'm missing like some memory contiguous requirements etc. Cheers

maleadt commented 9 months ago

In particular, I want to hear if there's something I'm missing like some memory contiguous requirements etc.

Not really; support for wrapped arrays in Julia just... isn't great. We have functionality for scalar and vectorized indexing implemented here, https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/host/indexing.jl, where the vectorized_getindex functionality in principle supports any kind of array-like GPU input (i.e. not just a JLArray, but also its transpose). However, to make Julia pick up those definitions, we'd need to extend the getindex definition to cover any wrapped AbstractGPUArrays, which the type system just isn't great at. We have an Adapt.jl-based AnyGPUArray type, but that's a horrible union which significantly slows down method deserialization, only covers one level of wrapping, and introduces lots of ambiguities. As such, I'm wary to just extend all our functions from ::AbstractGPUArray to ::AnyGPUArray. This probably needs a more fundamental fix, maybe something like https://github.com/JuliaLang/julia/issues/51910.

nomadbl commented 9 months ago

Ok thanks for your response. For now it looks like the PR I linked will at least alleviate some issues but a more general fix would be nice at some point. Perhaps it is also a matter of time and accumulated experience that's needed...