JuliaDecisionFocusedLearning / ImplicitDifferentiation.jl

Automatic differentiation of implicit functions
https://juliadecisionfocusedlearning.github.io/ImplicitDifferentiation.jl/
MIT License
122 stars 7 forks source link

Differentiating Implicit function inside implicit function with Enzyme #153

Open benjaminfaber opened 3 weeks ago

benjaminfaber commented 3 weeks ago

I've run into an issue where I want to compute the gradient of an implicit function that itself depends on another implicit function. I can do the operation successfully with FowardDiff, however I would like to use a backend that is faster, like Enzyme. When I run the MWE below, I run into a segmentation fault, I think related to the fact that one needs use autodiff_deferred, but it looks like ADTypes doesn't yet have a backend option for autodiff_deferred. Should I open an issue there or is this something that can be implemented in ImplicitDifferentiation/DifferentiationInterface?

Tagging @just-walk

using ImplicitDifferentiation, Optim, Enzyme, ForwardDiff

function f1(x, method)
    f(y) = sum(abs2, y .^ 2 .- x)
    y0 = ones(eltype(x), size(x))
    result = optimize(f, y0, method)
    return Optim.minimizer(result)
end;

function c1(x, y, method)
    ∇₂f = @. 4 * (y^2 - x) * y
    return ∇₂f
end;

implicit_f1 = ImplicitFunction(f1, c1)

function f2(x, method)
    z = implicit_f1(x, method)
    f(y) = sum(abs2, y .^ 2 .- z .* x)
    y0 = ones(eltype(x), size(x))
    result = optimize(f, y0, method)
    return Optim.minimizer(result)
end

function c2(x, y, method)
    z = implicit_f1(x, method)
    ∇₂f = @. 4 * (y^2 - z * x) * y
    return ∇₂f
end

implicit_f2 = ImplicitFunction(f2, c2)

x = [4., 9.]

dx = ([1., 0.], [0., 1.])

# Works
df1 = Enzyme.autodiff(Enzyme.Forward, implicit_f1, BatchDuplicatedNoNeed, BatchDuplicated(x, dx), Const(LBFGS()))[1]

┌ Warning: Using fallback BLAS replacements for (["dasum_64_"]), performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/Y4hSX/src/utils.jl:59
((var"1" = (var"1" = [0.25000000000806183, 0.0], var"2" = [0.0, 0.16666666667288246]),)

df2 = ForwardDiff.jacobian(_x -> implicit_f2(_x, LBFGS()), x)
2×2 Matrix{Float64}:
 0.53033  0.0
 0.0      0.433013

# Does not work

df2 = autodiff(Forward, implicit_f2, BatchDuplicatedNoNeed, BatchDuplicated(x, dx), Const(LBFGS()))[1]
[32468] signal (11.1): Segmentation fault
in expression starting at /home/bfaber/TempPkg/src/diff_test.jl:44
gc_mark_obj16 at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1894 [inlined]
gc_mark_outrefs at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2654 [inlined]
gc_mark_loop_serial_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2697
gc_mark_loop_serial at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2720
gc_mark_loop at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:2901 [inlined]
_jl_gc_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3234
ijl_gc_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3531
maybe_collect at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1300
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:1357 [inlined]
jl_gc_alloc_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia_internal.h:476 [inlined]
jl_gc_alloc at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gc.c:3583
_new_array_ at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:134 [inlined]
_new_array at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_1d at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/array.c:436
Array at ./boot.jl:477 [inlined]
.
.
.
gdalle commented 3 weeks ago

Indeed ADTypes has no option for choosing betweeen autodiff and autodiff_deferred with Enzyme, partly because this choice should become unnecessary in the medium term (according to @wsmoses). In the meantime, DifferentiationInterface allows the switch internally with the function DifferentiationInterface.nested(backend). You can try setting conditions_x_backend and conditions_y_backend to DifferentiationInterface.nested(AutoEnzyme()) in your inner ImplicitFunction to automatically use autodiff_deferred.

I should also note that at the moment, Enzyme is only supported in forward mode by ImplicitDifferentiation (see #35). If your input is very high-dimensional (how big are we talking?), you may be better served by a reverse mode backend. Can you maybe give a mathematical description of the real function(s) you're trying to differentiate?

wsmoses commented 3 weeks ago

That bug is a GC error not the result of deferred.

i fixed two such bugs in Julia proper for the 1.10.5 release see if that helps in not open a mwe issue on enzyme proper

wsmoses commented 3 weeks ago

Deferred isn’t needed here, since the custom rule used is calling autodiff. Deferred is only required if doing higher order AD, Eg doing autodiff of autodiff

gdalle commented 3 weeks ago

I think higher order is what's being done here: the conditions c2 involve autodiff of implicit_f1

wsmoses commented 3 weeks ago

Anything in a custom rule is considered top level (aka does not have the deferred problem).

Of course if the code does call autodiff of autodiff elsewhere there may be a deferred issue.

Nevertheless the error log is a GC issue not a deferred error so any deferred use won’t fix it.

On Thu, Aug 15, 2024 at 1:27 AM Guillaume Dalle @.***> wrote:

I think higher order is what's being done here: the conditions c2 involve autodiff of implicit_f1

— Reply to this email directly, view it on GitHub https://github.com/JuliaDecisionFocusedLearning/ImplicitDifferentiation.jl/issues/153#issuecomment-2290662698, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXBZAMOPXWBJW273EY3ZRQ333AVCNFSM6AAAAABMRE5ZGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJQGY3DENRZHA . You are receiving this because you were mentioned.Message ID: <JuliaDecisionFocusedLearning/ImplicitDifferentiation. @.***>

benjaminfaber commented 3 weeks ago

Thank you both, for the clarification on this, and being willing to put up with questions from a relative newcomer to AD. To answer your question @gdalle, the intended application is a plasma physics/fusion energy optimization problem where we can have up to 300+ parameters in our state vector. The gradient calculation needed is ultimately going to be the gradient of a scalar optimization target, so reverse mode will be the way we want to go. The problem I'm currently trying to tackle has three nested optimization problems (A, B, C), where B depends on A and C depends on both B and A. I can use NonlinearSolve and the NonlinearLeastSquaresProblem, however I was not having any luck calculating the sensitivities whereas I have been able to (albeit slowly) with ForwardDiff and ImplicitDifferentiation, so I've been pursuing that route. I may be doing things incorrectly with NonlinearSolve, however we still have the ultimate goal of wanting to use a legacy Fortran application to do the forward solve in the implicit problem, so we'll need ImplicitDifferentiation for that.

I'll see what happens with I use a reverse mode differentiator and if that has any issues.

gdalle commented 3 weeks ago

If you have any mathematical formulation of the nested optimization problems that you're free to share, id love to take a look!

benjaminfaber commented 3 weeks ago

Sure, I'd be happy to give a little mathematical background on the problem. We are trying to optimize the shape of the magnetic field used in magnetic confinement fusion concepts, which can be represented efficiently in cylindrical coordinates with a non-standard Fourier transformation:

R(r,\theta,\zeta) = \sum\limits_{m,n}^{M,N} R_{m,n}^c(r)\cos\left(n\theta + n\zeta\right) + R_{m,n}^s \sin\left(m\theta + n\zeta\right),
Z(r,\theta,\zeta) = \sum\limits_{m,n}^{M,N} Z_{m,n}^c(r)\cos\left(n\theta + n\zeta\right) + Z_{m,n}^s \sin\left(m\theta + n\zeta\right),
\phi = -\zeta

Derived quantities, like the Jacobian of the coordinate transformation, also have the same Fourier representation. I'm using Optimization.jl to find the Fourier modes of the derived quantities that satisfy their constraint conditions. For example, Jacobian equation satisfies the following condition

\frac{\partial}{\partial\theta}\left(\frac{1 + \frac{\partial R}{\partial \zeta}^2 + \frac{\partial Z}{\partial \zeta}^2 + \iota \left(\frac{\partial R}{\partial \theta} \frac{\partial R}{\partial \zeta} + \frac{\partial Z}{\partial \theta}\frac{\partial Z}{\partial \zeta}\right)}{\sqrt{g}}\right) + \frac{\partial}{\partial\zeta}\left(\frac{\frac{\partial R}{\partial \theta}\frac{\partial R}{\partial \zeta} + \frac{\partial Z}{\partial \theta}\frac{\partial Z}{\partial \zeta} + \iota \left(\frac{\partial R}{\partial \theta} ^2 + \frac{\partial Z}{\partial \theta} ^ 2\right)}{\sqrt{g}}\right) = 0.

Other quantities then depend on this Jacobian, so I've found it straight forward to solve for the Fourier modes of those quantities as another optimization problem where I can write down the optimality condition easily to use ImplicitDifferentiation to compute sensitivities, and is where I ran into this problem initially.

All that being said, I've found a workaround is to solve for all of the dependent quantities simultaneously, so that I only need to make one call to compute the derivative of an ImplicitFunction and don't need to currently deal with this problem.

gdalle commented 2 weeks ago

Thanks, glad you found a workaround! We'll revisit this when Julia 1.10.5 comes out