CarloLucibello / GraphNeuralNetworks.jl

Graph Neural Networks in Julia
https://carlolucibello.github.io/GraphNeuralNetworks.jl/dev/
MIT License
220 stars 46 forks source link

View arrays on GPU cause scalar indexing error #349

Closed bicycle1885 closed 11 months ago

bicycle1885 commented 11 months ago

The following snippet causes scalar indexing error on GPU, which is reproducible on the latest release and the master branch.

using CUDA
using Flux
using GraphNeuralNetworks
CUDA.allowscalar(false)

g = rand_graph(128, 512)
xi = randn(Float32, 4, 128)

# Having both of these two lines is needed to reproduce the bug.
g, xi = gpu(g), gpu(xi)
xi = xj = view(xi, axes(xi)...)

# ERROR: LoadError: Scalar indexing is disallowed.
apply_edges((xi, xj, e) -> 0, g; xi, xj = xi)
kenta@lizzle:~/tmp$ julia -q
(tmp) pkg> st
Status `~/tmp/Project.toml`
⌅ [052768ef] CUDA v4.4.1
  [587475ba] Flux v0.14.6
  [cffab07f] GraphNeuralNetworks v0.6.14 `https://github.com/CarloLucibello/GraphNeuralNetworks.jl.git#master`
⌃ [02a925ec] cuDNN v1.1.1
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

julia> 
kenta@lizzle:~/tmp$ julia gnn.jl 
ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
  [3] getindex
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:9 [inlined]
  [4] getindex
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/indexing.jl:30 [inlined]
  [5] gather!(dst::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, src::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, idx::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
    @ NNlib ~/.julia/packages/NNlib/5iRSB/src/gather.jl:107
  [6] gather
    @ ~/.julia/packages/NNlib/5iRSB/src/gather.jl:46 [inlined]
  [7] _gather(x::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, i::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
    @ GraphNeuralNetworks.GNNGraphs ~/.julia/packages/GraphNeuralNetworks/gujpQ/src/GNNGraphs/gatherscatter.jl:4
  [8] apply_edges(f::var"#3#4", g::GNNGraph{Tuple{CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}}, xi::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, xj::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, e::Nothing)
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/gujpQ/src/msgpass.jl:146
  [9] apply_edges(f::Function, g::GNNGraph{Tuple{CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, Nothing}}; xi::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, xj::SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}, e::Nothing)
    @ GraphNeuralNetworks ~/.julia/packages/GraphNeuralNetworks/gujpQ/src/msgpass.jl:139
 [10] top-level scope
    @ ~/tmp/gnn.jl:14
in expression starting at /home/kenta/tmp/gnn.jl:14

I think this is a regression because it starts to happen since GraphNeuralNetworks.jl 0.6.8 (see below). I'm not sure which package actually causes this error, but I realized it when I updated GraphNeuralNetworks.jl.

kenta@lizzle:~/tmp$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.3 (2023-08-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(tmp) pkg> add GraphNeuralNetworks@0.6.7
   Resolving package versions...
    Updating `~/tmp/Project.toml`
⌅ [587475ba] ↓ Flux v0.14.6 ⇒ v0.13.17
⌃ [cffab07f] ~ GraphNeuralNetworks v0.6.14 `https://github.com/CarloLucibello/GraphNeuralNetworks.jl.git#master` ⇒ v0.6.7
    Updating `~/tmp/Manifest.toml`
⌅ [587475ba] ↓ Flux v0.14.6 ⇒ v0.13.17
⌃ [cffab07f] ~ GraphNeuralNetworks v0.6.14 `https://github.com/CarloLucibello/GraphNeuralNetworks.jl.git#master` ⇒ v0.6.7
⌅ [872c559c] ↓ NNlib v0.9.7 ⇒ v0.8.21
  [a00861dc] + NNlibCUDA v0.2.7
⌅ [3bd65402] ↓ Optimisers v0.3.1 ⇒ v0.2.20
        Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`
Precompiling project...
  7 dependencies successfully precompiled in 16 seconds. 118 already precompiled.

(tmp) pkg> 
kenta@lizzle:~/tmp$ julia gnn.jl  # this works
kenta@lizzle:~/tmp$ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.3 (2023-08-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(tmp) pkg> add GraphNeuralNetworks@0.6.8
   Resolving package versions...
   Installed GraphNeuralNetworks ─ v0.6.8
    Updating `~/tmp/Project.toml`
  [587475ba] ↑ Flux v0.13.17 ⇒ v0.14.6
⌃ [cffab07f] ↑ GraphNeuralNetworks v0.6.7 ⇒ v0.6.8
    Updating `~/tmp/Manifest.toml`
  [587475ba] ↑ Flux v0.13.17 ⇒ v0.14.6
⌃ [cffab07f] ↑ GraphNeuralNetworks v0.6.7 ⇒ v0.6.8
  [872c559c] ↑ NNlib v0.8.21 ⇒ v0.9.7
  [a00861dc] - NNlibCUDA v0.2.7
  [3bd65402] ↑ Optimisers v0.2.20 ⇒ v0.3.1
        Info Packages marked with ⌃ have new versions available and may be upgradable.
Precompiling project...
  10 dependencies successfully precompiled in 22 seconds. 118 already precompiled.

(tmp) pkg> 
kenta@lizzle:~/tmp$ julia gnn.jl 
ERROR: LoadError: Scalar indexing is disallowed.
CarloLucibello commented 11 months ago

The problem is not in this repo but due to the gather! implementation in NNlib. The view xi is of type

julia> typeof(xi)
SubArray{Float32, 2, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, false}

for which we have

julia> xi isa AnyCuArray
true

julia> xi isa AbstractGPUArray
false

julia> xi isa AnyGPUArray
true

so we are not hitting the specialization https://github.com/FluxML/NNlib.jl/blob/607de4b8fec751e1079d2822ac950028bb819c1c/src/gather.jl#L112

I think this can be solved by relaxing the signature to AnyGPUArray in NNlib. I'll try and see what happens.