FluxML / Zygote.jl

21st century AD
https://fluxml.ai/Zygote.jl/
Other
1.48k stars 213 forks source link

nested gradient with hessian #1264

Open YichengDWu opened 2 years ago

YichengDWu commented 2 years ago

Reverse on forward on reverse:

julia> function f1(x, ps)  # [edit: renamed not to clash]
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ ps.bias
       end
f1 (generic function with 1 method)

julia> x = rand(3)
3-element Vector{Float64}:
 0.9750274052932691
 0.015723824729741764
 0.9792305251283961

julia> ps = (;bias = rand(3))
(bias = [0.6184575461887033, 0.24789621977449272, 0.5451996227584986],)

julia> f1(x,ps)
3-element Vector{Float64}:
 6.322528192626253
 0.24937965175928256
 6.298554150817905

julia> Zygote.gradient(p -> sum(f1(x,p)), ps)
ERROR: Mutating arrays is not supported -- called setindex!(Matrix{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] _throw_mutation_error(f::Function, args::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
  [3] (::Zygote.var"#444#445"{Matrix{Float64}})(#unused#::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:82
  [4] (::Zygote.var"#2496#back#446"{Zygote.var"#444#445"{Matrix{Float64}}})(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
  [5] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:31 [inlined]
  [6] (::typeof(∂(forward_jacobian)))(Δ::Tuple{Nothing, Matrix{Float64}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [7] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44 [inlined]
  [8] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43 [inlined]
  [9] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:76 [inlined]
 [10] (::typeof(∂(hessian_dual)))(Δ::Matrix{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [11] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:74 [inlined]
 [12] Pullback
    @ .\REPL[119]:2 [inlined]
 [13] (::typeof(∂(f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [14] Pullback
    @ .\REPL[123]:1 [inlined]
 [15] (::typeof(∂(#97)))(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [16] (::Zygote.var"#60#61"{typeof(∂(#97))})(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
 [17] gradient(f::Function, args::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
 [18] top-level scope
    @ REPL[123]:1

Forward on forward on reverse

julia> Zygote.forward_jacobian(p -> sum(f1(x,p)), ps)
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{1}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] forward_jacobian(f::var"#99#100", x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, #unused#::Val{1})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:29
 [3] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}; chunk_threshold::Int64)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:44
 [4] forward_jacobian(f::Function, x::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:43
 [5] top-level scope
   @ REPL[124]:1

It would be great to support the second mode. Looks like it won't take too much to achieve that. If I change ps to a vector it can work smoothly.

julia> function f2(x, bias)
       hess = Zygote.hessian(x->sum(x.^3), x)
       return hess * x .+ bias
       end
f2 (generic function with 1 method)

julia> Zygote.forward_jacobian(p -> sum(f2(x,p)), rand(3))
(13.320223875130782, [1.0; 1.0; 1.0;;])
YichengDWu commented 2 years ago

It boils down to generalizing Zygote.seed to NamedTuple

julia> Zygote.seed(ps,Val(12))
ERROR: MethodError: no method matching size(::NamedTuple{(:bias,), Tuple{Vector{Float64}}})
Closest candidates are:
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:567
  size(::Union{LinearAlgebra.QR, LinearAlgebra.QRCompactWY, LinearAlgebra.QRPivoted}, ::Integer) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\qr.jl:566
  size(::Union{LinearAlgebra.Cholesky, LinearAlgebra.CholeskyPivoted}) at C:\Users\Luffy\AppData\Local\Programs\Julia-1.7.2\share\julia\stdlib\v1.7\LinearAlgebra\src\cholesky.jl:494
  ...
Stacktrace:
 [1] seed(x::NamedTuple{(:bias,), Tuple{Vector{Float64}}}, ::Val{12}, offset::Int64) (repeats 2 times)
   @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\forward.jl:7
 [2] top-level scope
   @ REPL[160]:1
 [3] top-level scope
   @ C:\Users\Luffy\.julia\packages\CUDA\tTK8Y\src\initialization.jl:52
YichengDWu commented 2 years ago

So this is another bug i guess

julia> function f(x, bias)
              jac = Zygote.jacobian(x->x.^3, x)[1]
              return jac * x .+ bias
              end
f (generic function with 1 method)

julia> x,bias = rand(3),rand(3)
([0.2279638899624825, 0.6476786632858718, 0.13745627655377346], [0.051516386842686224, 0.6842360463718182, 0.22031281411507742])

julia> Zygote.gradient(b -> sum(f(x,b)), rand(3))
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/dev/limitations.html#Array-mutation-1

Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] _throw_mutation_error(f::Function, args::SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:70
  [3] (::Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}})(#unused#::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\array.jl:85
  [4] (::Zygote.var"#2506#back#450"{Zygote.var"#448#449"{SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}}})(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67
  [5] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:183 [inlined]
  [6] (::typeof(∂(_gradcopy!)))(Δ::Nothing)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [7] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:165 [inlined]
  [8] (::typeof(∂(withjacobian)))(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
  [9] (::Zygote.var"#216#217"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, typeof(∂(withjacobian))})(Δ::NamedTuple{(:val, :grad), Tuple{Nothing, Tuple{Matrix{Float64}}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\lib.jl:207
 [10] #1909#back
    @ C:\Users\Luffy\.julia\packages\ZygoteRules\AIbCs\src\adjoint.jl:67 [inlined]
 [11] Pullback
    @ C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\lib\grad.jl:140 [inlined]
 [12] (::typeof(∂(jacobian)))(Δ::Tuple{Matrix{Float64}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [13] Pullback
    @ .\REPL[20]:2 [inlined]
 [14] (::typeof(∂(f)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [15] Pullback
    @ .\REPL[22]:1 [inlined]
 [16] (::typeof(∂(#23)))(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface2.jl:0
 [17] (::Zygote.var"#60#61"{typeof(∂(#23))})(Δ::Float64)
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:41
 [18] gradient(f::Function, args::Vector{Float64})
    @ Zygote C:\Users\Luffy\.julia\packages\Zygote\IoW2g\src\compiler\interface.jl:76
 [19] top-level scope
    @ REPL[22]:1
 [20] top-level scope
    @ C:\Users\Luffy\.julia\packages\CUDA\DfvRa\src\initialization.jl:52
mcabbott commented 2 years ago

Zygote's jacobian function isn't Zygote-differentiable. There's no major barrier to making it so, someone just has to do it. I think the most relevant issue for this is https://github.com/FluxML/Zygote.jl/issues/953 . That's pure-Zygote, unlike hessian which involves ForwardDiff... maybe this issue can be about that alone?

YichengDWu commented 2 years ago

Mathematically, I did not differentiate the jacobian itself. The Jacobian here should be treated as a constant. This could be a dumb question but why can't Zygote tell from the input of Zygote.jacobian to avoid redundant differentiation?

mcabbott commented 2 years ago

Zygote does not know this, unfortunately. It works backwards from the final result, in complete ignorance of which branches of the expression tree ultimately lead to gradients you do or do not want.

ToucheSir commented 2 years ago

More specifically, Zygote will run the AD transform on any function which doesn't have an existing rrule/@adjoint (N.B: @nograd and @non_differentiable generate these automatically), and generate an output which is used in the primal computation. For your particular case, ChainRulesCore provides a manual stop gradient operation via https://juliadiff.org/ChainRulesCore.jl/stable/api.html#ChainRulesCore.@ignore_derivatives and co.

YichengDWu commented 2 years ago

Ok, I get it now. What also confuses me is the error message here. It Mutating arrays is not supported not no adjoint for Zygote.jacobian. Quite misleading to me.

ToucheSir commented 2 years ago

Just because Zygote fails on a function due to it using mutation does not mean the solution is to write an adjoint for it. Alternatives may include rewriting the offending bits to not use mutation, using Buffer, marking the function or a function higher up the call stack @non_differentiable, writing a custom rule for a function higher up the stack, etc.

YichengDWu commented 2 years ago

In essence, It looks like the same kind of failure as the reverse on forward on reverse given that they have similar error messages.

For forward on forward on reverse, after adding a method to Zygote.seed it is working fine https://github.com/jonniedie/ComponentArrays.jl/issues/149

mcabbott commented 2 years ago

no adjoint for Zygote.jacobian. Quite misleading

The reason for this is that looking inside unknown functions is literally what it does for a living. It doesn't stop at the call to Zygote.jacobian because this is nothing special; it hopes to successfully figure out what happens inside. And it fails because the function makes an array & writes into it.

Many failures thus have the same error message. Zygote over ForwardDiff is its own ball of problems besides mutation.

YichengDWu commented 2 years ago

Ah you're right, just a little whining

YichengDWu commented 2 years ago

I have a piece of code that involves multiple errors and it took me a long time to isolate each one 🥲

mcabbott commented 2 years ago

Oh I know. My only advice is to start small and add things...

I've marked the jacobian posts "duplicate" since these are exactly #1268 now.

YichengDWu commented 2 years ago

Should I change the title of this issue to "hessian is not differentiable"?

mcabbott commented 2 years ago

There are two hessian functions, and they are very short: https://github.com/FluxML/Zygote.jl/blob/7f2b16987dc4eab236aabd889444cabdcc6f9caf/src/lib/grad.jl#L74-L89

The all-zygote one would in principal be differentiable, if #1268 were solved.

julia> function f3(x, ps)
         hess = Zygote.hessian_reverse(x->sum(abs2, x.^3), x)
         return hess * x .+ ps.bias
       end
f3 (generic function with 1 method)

julia> Zygote.gradient(p -> sum(f3(x,p)), ps)
ERROR: Mutating arrays is not supported -- called copyto!(SubArray{Float64, 1, Matrix{Float64}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}}, true}, ...)

However, 3rd order derivatives using Zygote are unlikely to ever be a good idea. I think the tests contain one like this, and it is very very slow:

julia> @time @eval sin'''(1.0)
120.620770 seconds (93.61 M allocations: 5.603 GiB, 13.76% gc time, 99.81% compilation time)
-0.5403023058681398
YichengDWu commented 2 years ago

Sure it's slow. Forward over forward over reverse is more reasonable. So this goes back to supporting Zygote.seed(x::NamedTuple,...)