FluxML / Zygote.jl

21st century AD
https://fluxml.ai/Zygote.jl/
Other
1.47k stars 209 forks source link

ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), #1317

Closed CarloLucibello closed 1 year ago

CarloLucibello commented 1 year ago

A recent update to Zygote or ChainRules has caused a test failure in GNN.jl. I had a hard time trying to reduce the example, e.g. same code but within a function triggers the error, so I'll just post the opaque example below hoping that someone can decrypt the error. This is on

Zygote v0.6.49
ChainRulesCore v1.15.6

The gradient of the NNConv layer now fails. The layer involves NNlib.batched_mul, which seems to be a crucial ingredient for triggering the error.

Reproducer:

julia> using GraphNeuralNetworks, Zygote, Flux

julia> n, m  = 10, 20;

julia> ndim = 2;

julia> edim = 10;

julia> g = rand_graph(n, m);

julia> x = rand(ndim, n);

julia> e = rand(edim, m);

julia> nn = Dense(edim, ndim * ndim);

julia> l = NNConv(ndim => ndim, nn, tanh);

julia> gradient(x -> sum(l(g, x, e)), x)
ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), Tuple{FillArrays.Fill{Float64, 3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::Union{InitialValues.NonspecificInitialValue, InitialValues.SpecificInitialValue{typeof(+)}}, ::Any) at ~/.julia/packages/InitialValues/OWP8V/src/InitialValues.jl:154
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/C73ay/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{FillArrays.Fill{Float64, 3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/lib.jl:17
  [2] accum(x::Base.RefValue{Any}, y::Nothing, zs::NamedTuple{(:contents,), Tuple{FillArrays.Fill{Float64, 3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/lib/lib.jl:22
  [3] Pullback
    @ ~/.julia/dev/GraphNeuralNetworks/src/layers/conv.jl:700 [inlined]
  [4] (::typeof(∂(λ)))(Δ::FillArrays.Fill{Float64, 2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
  [5] Pullback
    @ ./REPL[64]:1 [inlined]
  [6] (::typeof(∂(#224)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface2.jl:0
  [7] (::Zygote.var"#60#61"{typeof(∂(#224))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:45
  [8] gradient(f::Function, args::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/dABKa/src/compiler/interface.jl:97
  [9] top-level scope
    @ REPL[64]:1
 [10] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52
CarloLucibello commented 1 year ago

The regression happened with Zygote v0.6.44

ToucheSir commented 1 year ago

Looks like https://github.com/FluxML/Zygote.jl/issues/1290#issuecomment-1221431340. If I had to guess, https://github.com/CarloLucibello/GraphNeuralNetworks.jl/blob/7f16b69a32f99710e7ad702ce7c7edf4af8d15f7/src/layers/conv.jl#L702 is boxing a captured variable?

CarloLucibello commented 1 year ago

Looks like the problem is due to boxing indeed and can be considered a duplicate of #290. I'll fix things on the GNN.jl side avoiding closures. Thanks for looking into it @ToucheSir