Open YichengDWu opened 2 years ago
It boils down to generalizing Zygote.seed
to NamedTuple
julia> Zygote.seed(ps,Val(12))
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
...
Stacktrace:
[1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{12}, offset::Int64) (repeats 2 times)
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
[2] top-level scope
@ REPL[160]:1
[3] top-level scope
@ C:\Users\Luffy\.julia\packages\CUDA\tTK8Y\src\initialization.jl:52
So this is another bug i guess
julia> function f(x, bias)
jac = Zygote.jacobian(x->x.^3, x)[1]
return jac * x .+ bias
end
f (generic function with 1 method)
julia> x,bias = rand(3),rand(3)
([0.2279638899624825, 0.6476786632858718, 0.13745627655377346], [0.051516386842686224, 0.6842360463718182, 0.22031281411507742])
julia> Zygote.gradient(b -> sum(f(x,b)), rand(3))
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)
Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:33
[2] _throw_mutation_error(f::Function, args::SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
[3] (::Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}})(#unused#::Nothing)
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:85
[4] (::Zygote.var"#2506#back#450"{Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}})(Δ::Nothing)
@ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
[5] Pullback
@ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:183 [inlined]
[6] (::typeof(∂(_gradcopy!)))(Δ::Nothing)
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
[7] Pullback
@ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:165 [inlined]
[8] (::typeof(∂(withjacobian)))(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
[9] (::Zygote.var"#216#217"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, typeof(∂(withjacobian))})(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\lib.jl:207
[10] #1909#back
@ C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67 [inlined]
[11] Pullback
@ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:140 [inlined]
[12] (::typeof(∂(jacobian)))(Δ::Tuple{Matrix{Float64}})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
[13] Pullback
@ .\REPL[20]:2 [inlined]
[14] (::typeof(∂(f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
[15] Pullback
@ .\REPL[22]:1 [inlined]
[16] (::typeof(∂(#23)))(Δ::Float64)
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
[17] (::Zygote.var"#60#61"{typeof(∂(#23))})(Δ::Float64)
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
[18] gradient(f::Function, args::Vector{Float64})
@ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
[19] top-level scope
@ REPL[22]:1
[20] top-level scope
@ C:\Users\Luffy\.julia\packages\CUDA\DfvRa\src\initialization.jl:52
Zygote's jacobian function isn't Zygote-differentiable. There's no major barrier to making it so, someone just has to do it. I think the most relevant issue for this is https://github.com/FluxML/Zygote.jl/issues/953 . That's pure-Zygote, unlike hessian
which involves ForwardDiff... maybe this issue can be about that alone?
Mathematically, I did not differentiate the jacobian itself. The Jacobian here should be treated as a constant. This could be a dumb question but why can't Zygote tell from the input of Zygote.jacobian
to avoid redundant differentiation?
Zygote does not know this, unfortunately. It works backwards from the final result, in complete ignorance of which branches of the expression tree ultimately lead to gradients you do or do not want.
More specifically, Zygote will run the AD transform on any function which doesn't have an existing rrule
/@adjoint
(N.B: @nograd
and @non_differentiable
generate these automatically), and generate an output which is used in the primal computation. For your particular case, ChainRulesCore provides a manual stop gradient operation via https://juliadiff.org/ChainRulesCore.jl/stable/api.html#ChainRulesCore.@ignore_derivatives and co.
Ok, I get it now. What also confuses me is the error message here. It Mutating arrays is not supported
not no adjoint for Zygote.jacobian
. Quite misleading to me.
Just because Zygote fails on a function due to it using mutation does not mean the solution is to write an adjoint for it. Alternatives may include rewriting the offending bits to not use mutation, using Buffer
, marking the function or a function higher up the call stack @non_differentiable
, writing a custom rule for a function higher up the stack, etc.
In essence, It looks like the same kind of failure as the reverse on forward on reverse given that they have similar error messages.
For forward on forward on reverse, after adding a method to Zygote.seed
it is working fine https://github.com/jonniedie/ComponentArrays.jl/issues/149
no adjoint for Zygote.jacobian. Quite misleading
The reason for this is that looking inside unknown functions is literally what it does for a living. It doesn't stop at the call to Zygote.jacobian
because this is nothing special; it hopes to successfully figure out what happens inside. And it fails because the function makes an array & writes into it.
Many failures thus have the same error message. Zygote over ForwardDiff is its own ball of problems besides mutation.
Ah you're right, just a little whining
I have a piece of code that involves multiple errors and it took me a long time to isolate each one 🥲
Oh I know. My only advice is to start small and add things...
I've marked the jacobian posts "duplicate" since these are exactly #1268 now.
Should I change the title of this issue to "hessian
is not differentiable"?
There are two hessian functions, and they are very short: https://github.com/FluxML/Zygote.jl/blob/7f2b16987dc4eab236aabd889444cabdcc6f9caf/src/lib/grad.jl#L74-L89
The all-zygote one would in principal be differentiable, if #1268 were solved.
julia> function f3(x, ps)
hess = Zygote.hessian_reverse(x->sum(abs2, x.^3), x)
return hess * x .+ ps.bias
end
f3 (generic function with 1 method)
julia> Zygote.gradient(p -> sum(f3(x,p)), ps)
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)
However, 3rd order derivatives using Zygote are unlikely to ever be a good idea. I think the tests contain one like this, and it is very very slow:
julia> @time @eval sin'''(1.0)
120.620770 seconds (93.61 M allocations: 5.603 GiB, 13.76% gc time, 99.81% compilation time)
-0.5403023058681398
Sure it's slow. Forward over forward over reverse is more reasonable. So this goes back to supporting Zygote.seed(x::NamedTuple,...)
Reverse on forward on reverse:
Forward on forward on reverse
It would be great to support the second mode. Looks like it won't take too much to achieve that. If I change
ps
to a vector it can work smoothly.