Open CarloLucibello opened 3 years ago
Where presumably you define something like:
julia> y = Flux.onehotbatch([1,2,3,1,2], 1:3)
3×5 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
1 ⋅ ⋅ 1 ⋅
⋅ 1 ⋅ ⋅ 1
⋅ ⋅ 1 ⋅ ⋅
julia> gpu(y)
3×5 OneHotMatrix(::CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}) with eltype Bool:
1 ⋅ ⋅ 1 ⋅
⋅ 1 ⋅ ⋅ 1
⋅ ⋅ 1 ⋅ ⋅
The gradient of sum seems not to notice that the eltype of this is Bool?
Sometimes CuArray bypasses that rule, because of this (intended to avoid Fill I think): https://github.com/FluxML/Zygote.jl/blob/master/src/lib/broadcast.jl#L269
julia> gradient(sum, randn(3).>0)
(nothing,)
julia> gradient(sum, cu(randn(3).>0))
(Bool[1, 1, 1],)
julia> y isa CUDA.AbstractGPUArray # hence this rule shouldn't apply to y
false
julia> gradient(sum, gpu(y)) # still fails
ERROR: Scalar indexing is disallowed.
If we pick a function whose gradient isn't zero on the CPU (because not every operation projects) then I presume that scalar indexing is unavoidable:
ulia> gradient(y -> sum(y[:,1:3] .+ y[:, 1:3]'), y)
([2 2 … 0 0; 2 2 … 0 0; 2 2 … 0 0],)
julia> gradient(y -> sum(y[:,1:3] .+ y[:, 1:3]'), gpu(y))
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:33
[2] assertscalar(op::String)
@ GPUArrays ~/.julia/packages/GPUArrays/UBzTm/src/host/indexing.jl:53
[3] getindex(::CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64)
@ GPUArrays ~/.julia/packages/GPUArrays/UBzTm/src/host/indexing.jl:86
[4] getindex
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:179 [inlined]
[5] _getindex
@ ./abstractarray.jl:1265 [inlined]
[6] getindex
@ ./abstractarray.jl:1221 [inlined]
[7] _broadcast_getindex
@ ./broadcast.jl:636 [inlined]
[8] _getindex
@ ./broadcast.jl:667 [inlined]
[9] _getindex
@ ./broadcast.jl:666 [inlined]
[10] _broadcast_getindex
@ ./broadcast.jl:642 [inlined]
[11] getindex
@ ./broadcast.jl:597 [inlined]
[12] macro expansion
@ ./broadcast.jl:1005 [inlined]
[13] macro expansion
@ ./simdloop.jl:77 [inlined]
[14] copyto!
@ ./broadcast.jl:1004 [inlined]
[15] copyto!
@ ./broadcast.jl:957 [inlined]
[16] materialize!
@ ./broadcast.jl:915 [inlined]
[17] materialize!
@ ./broadcast.jl:912 [inlined]
[18] (::Zygote.var"#430#432"{2, Bool, Flux.OneHotArray{UInt32, 3, 1, 2, CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}}, Tuple{Colon, UnitRange{Int64}}})(dy::LinearAlgebra.Adjoint{Int64, CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}})
@ Zygote ~/.julia/packages/Zygote/nsu1Y/src/lib/array.jl:39
[19] #2309#back
@ ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
[20] Pullback
@ ./REPL[28]:1 [inlined]
[21] (::Zygote.var"#52#53"{typeof(∂(#23))})(Δ::Int64)
@ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface.jl:41
[22] gradient(f::Function, args::Flux.OneHotArray{UInt32, 3, 1, 2, CuArray{UInt32, 1, CUDA.Mem.DeviceBuffer}})
@ Zygote ~/.julia/packages/Zygote/nsu1Y/src/compiler/interface.jl:76
You can then annotate that piece of user code with CUDA.@allowscalar
Came up when trying to do semisupervised learning with GNNs