JuliaGPU / CUDA.jl

CUDA programming in Julia.
https://juliagpu.org/cuda/
Other
1.21k stars 222 forks source link

`view(::CuArray, ...)` is sometimes type-unstable: #2526

Open charleskawczynski opened 3 weeks ago

charleskawczynski commented 3 weeks ago

Here is a reproducer:

a = CUDA.zeros(Float32, 2, 2);
view(a, 1, :); # make sure doesn't error
view(a, :, 1); # make sure doesn't error
@test_opt view(a, :, 1); # fails
@test_opt view(a, 1, :); # passes

Here is the error output:

julia> @test_opt view(a, :, 1); # fails
JET-test failed at REPL[61]:1
  Expression: #= REPL[61]:1 =# JET.@test_opt view(a, :, 1)
stmt = :($(Expr(:method, :(Base.getproperty(CUDA, :method_table)), %J15, CodeInfo(
1 ─      nothing
│   @ /home/charliek/.julia/packages/CUDA/2kjXI/src/device/array.jl:81 within `none`
└──      goto #3 if not $(Expr(:boundscheck))
2 ─      checkbounds(A, index)
    @ /home/charliek/.julia/packages/CUDA/2kjXI/src/device/array.jl:82 within `none`
3 ┄ %4 = Base.getproperty(Base, :isbitsunion)
│   %5 = (%4)($(Expr(:static_parameter, 1)))
└──      goto #5 if not %5
    @ /home/charliek/.julia/packages/CUDA/2kjXI/src/device/array.jl:83 within `none`
4 ─ %7 = arrayref_union(A, index)
└──      return %7
    @ /home/charliek/.julia/packages/CUDA/2kjXI/src/device/array.jl:85 within `none`
5 ─ %9 = arrayref_bits(A, index)
└──      return %9
))))
  ═════ 1 possible error found ═════
  ┌ view(::CuArray{Float32, 2, CUDA.DeviceMemory}, ::Colon, ::Int64) @ GPUArrays /home/charliek/.julia/packages/GPUArrays/qt4ax/src/host/base.jl:310
  │┌ unsafe_view(A::CuArray{Float32, 2, CUDA.DeviceMemory}, I::Tuple{Base.Slice{…}, Int64}, ::GPUArrays.Contiguous) @ GPUArrays /home/charliek/.julia/packages/GPUArrays/qt4ax/src/host/base.jl:314
  ││┌ unsafe_contiguous_view(a::CuArray{Float32, 2, CUDA.DeviceMemory}, I::Tuple{Base.Slice{…}, Int64}, dims::Tuple{Int64}) @ GPUArrays /home/charliek/.julia/packages/GPUArrays/qt4ax/src/host/base.jl:319
  │││┌ derive(::Type{Float32}, a::CuArray{Float32, 2, CUDA.DeviceMemory}, dims::Tuple{Int64}, offset::Int64) @ CUDA /home/charliek/.julia/packages/CUDA/2kjXI/src/array.jl:799
  ││││┌ kwcall(::@NamedTuple{…}, ::Type{…}, data::GPUArrays.DataRef{…}, dims::Tuple{…}) @ CUDA /home/charliek/.julia/packages/CUDA/2kjXI/src/array.jl:79
  │││││┌ (CuArray{Float32, 1})(data::GPUArrays.DataRef{CUDA.Managed{…}}, dims::Tuple{Int64}; maxsize::Int64, offset::Int64) @ CUDA /home/charliek/.julia/packages/CUDA/2kjXI/src/array.jl:83
  ││││││┌ finalizer(f::typeof(CUDA.unsafe_free!), o::CuArray{Float32, 1, CUDA.DeviceMemory}) @ Base ./gcutils.jl:87
  │││││││┌ unsafe_free!(xs::CuArray{Float32, 1, CUDA.DeviceMemory}) @ CUDA /home/charliek/.julia/packages/CUDA/2kjXI/src/array.jl:94
  ││││││││┌ unsafe_free!(::GPUArrays.DataRef{CUDA.Managed{CUDA.DeviceMemory}}) @ GPUArrays /home/charliek/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:91
  │││││││││┌ release(::GPUArrays.RefCounted{CUDA.Managed{CUDA.DeviceMemory}}) @ GPUArrays /home/charliek/.julia/packages/GPUArrays/qt4ax/src/host/abstractarray.jl:42
  ││││││││││ runtime dispatch detected: %24::Any(%25::CUDA.Managed{CUDA.DeviceMemory})::Any
  │││││││││└────────────────────

ERROR: There was an error during testing

I think that this is because view on the last dimension can (efficiently) return a regular CuArray, instead of a SubArray, but it's not clear to me why this would lead to a JET failure.

maleadt commented 3 weeks ago

Is this a problem? It's not always worthwhile to trade excessive specialization to get rid of a type instability. Especially in the scope of GPU operations, where everything is a multi-us call anyway, dynamic dispatch is pretty fast.

charleskawczynski commented 3 weeks ago

I haven't measured it, and I suspect it's not. I was just surprised that it's not type stable. That's a fair point about specialization, but does that mean that it must be type unstable?

maleadt commented 3 weeks ago

No, with sufficient inlining / constant propagation / effects annotations the code could be made inferrable.

huiyuxie commented 5 days ago

It's not always worthwhile to trade excessive specialization to get rid of a type instability. Especially in the scope of GPU operations, where everything is a multi-us call anyway, dynamic dispatch is pretty fast.

True, and where do you (and how do you) @maleadt draw the line when trading off between specialization and dynamic dispatch on the GPU 🤔? On the CPU, making everything type-stable seems good, but on the GPU I can only rely on benchmarks - it is sometimes tedious and time-consuming.