AD: change the backend transparently

jbcaillau commented 1 year ago

we currently use ForwardDiff but should be able to move transparently (= without any user code change) to another backend
[x] use the backend choice as a default / global behaviour, not hard coded
[x] as soon as sth more efficient than dual numbers (Enzyme, e.g.) is available, switch

ocots commented 1 year ago

This is more a discussion than an issue, no? Should be transfered elsewhere?

jbcaillau commented 1 year ago

It is an issue. Now in CTBase.jl, FWIW

ocots commented 1 year ago

I am not convinced that this is an issue. It is more a wish :-)

jbcaillau commented 1 year ago

OK, move it!

jbcaillau commented 1 year ago

We should move from ForwardDiff to AbstractDifferentiation

jbcaillau commented 10 months ago

See also: FastDifferentiation.jl

ocots commented 4 months ago

Check also DifferentiationInterface.jl.

gdalle commented 3 months ago

Friendly ping from the creator of DifferentiationInterface: I'm available to help you make the transition if you want me to :)

ocots commented 3 months ago

Hi @gdalle! This would be great. Thanks. I propose first to post here how we use AD. @jbcaillau and @PierreMartinon, please complete.

We have defined some auxiliary functions that use ForwardDiff.jl package:

https://github.com/control-toolbox/CTBase.jl/blob/b73f56d6447e1967bf56c59f49dc8bcdb09dd71c/src/utils.jl#L76

function ctgradient(f::Function, x::ctNumber)
    return ForwardDiff.derivative(x -> f(x), x)
end

https://github.com/control-toolbox/CTBase.jl/blob/b73f56d6447e1967bf56c59f49dc8bcdb09dd71c/src/utils.jl#L101

function ctjacobian(f::Function, x::ctNumber) 
    return ForwardDiff.jacobian(x -> f(x[1]), [x])
end

We use these auxiliary functions for differential geometry in CTBase.jl package:

https://github.com/control-toolbox/CTBase.jl/blob/b73f56d6447e1967bf56c59f49dc8bcdb09dd71c/src/differential_geometry.jl#L95

We use these auxiliary functions also in the CTFlows.jl package:

https://github.com/control-toolbox/CTFlows.jl/blob/fff879627ccec8d3252694ae2ad27252522d676f/src/hamiltonian.jl#L61

function rhs(h::AbstractHamiltonian)
    function rhs!(dz::DCoTangent, z::CoTangent, v::Variable, t::Time)
        n      = size(z, 1) ÷ 2
        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)
        dh     = ctgradient(foo, z)
        dz[1:n]    =  dh[n+1:2n]
        dz[n+1:2n] = -dh[1:n]
    end
    return rhs!
end

I think we use AD also directly form third-party packages like ADNLPModels.jl:

https://github.com/control-toolbox/CTDirect.jl/blob/60edc0c8be071bba860db12c768f46f29e482592/src/solve.jl#L40

    # call NLP problem constructor
    docp.nlp = ADNLPModel!(x -> DOCP_objective(x, docp), 
                    x0,
                    docp.var_l, docp.var_u, 
                    (c, x) -> DOCP_constraints!(c, x, docp), 
                    docp.con_l, docp.con_u, 
                    backend = :optimized)

gdalle commented 3 months ago

Thanks for the links, I'll take a look but I already have a few questions.

Why do you call the derivative the gradient? What is this ctNumber that you use?

Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).

Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.

gdalle commented 3 months ago

As for ADNLPModels, they are also considering a switch to DifferentiationInterface but it might be slightly slower

jbcaillau commented 3 months ago

@gdalle Thanks for the PR and comments

Why do you call the derivative the gradient? What is this ctNumber that you use?

ctNumber = Real. We want to deal more or less uniformly with reals and one dimensional vectors, that is why the special case when the variable is a single real is explicitly dealt with.

Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).

Dimensions < 1e2, e.g. to build the right hand side of a Hamiltonian system.

Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.

✅ to be tested elsewhere (see also this comment)

ocots commented 3 months ago

I think that for CTBase.jl, a step has been done. Do we close this issue? We will see next in CTFlows.jl how to handle this.

gdalle commented 2 months ago

To me this is not yet done, because #141 added a backend kwargs to ctgradient and the like, but this kwarg is not passed from further up the chain. As a result, users cannot change the AD backend, even though package developers can through the __auto() function.

jbcaillau commented 2 months ago

@gdalle check this PR

gdalle commented 2 months ago

To clarify, even with this PR, you're currently doing something like

function solve_control_problem(f)
    # ...
    for i in 1:n
        x -= gradient(f, x)
    end
    # ...
end

function gradient(f, x, backend=default_backend())
    # ...
end

And for users who only care about high-level interfaces, and who never call gradient directly, the following seems better to me:

function solve_control_problem(f, backend)
    # ...
    for i in 1:n
        x -= gradient(f, x, backend)
    end
    # ...
end

function gradient(f, x, backend=default_backend())
    # ...
end

But you know best if that's relevant in your case or not.

ocots commented 2 months ago

We totally agree with you. The second choice is better.

But, actually the function to solve optimal control problems is not in the CTBase.jl package.

Besides, our function

function gradient(f, x, backend=default_backend())
    # ...
end

is not used in the resolution function of optimal control problems. It is used for instance in the package CTFlows.jl here. I agree that here I will have to add a kwarg for the AD backend.

About the resolution of the optimal control problems, we pass through ADNLPModels.jl and again we want the user to have the possibility to choose the AD backend.

jbcaillau commented 2 months ago

@gdalle agreed, thanks for the feedback. actually, there is now a setter that allows user / dev to change the backend (globally and dynamically); it is also easy to add optional kwarg to allow this anywhere it makes sense (solvers, etc.) We leave this issue open for further testing, e.g. for cases requiring a change of backend between first order derivative computation and second order ones.

On a side note: check this upcoming talk at JuliaCon 2024 (we'll also be around)

gdalle commented 2 months ago

Thanks for pointing out ADOLC.jl, we're already on the ball ;) see https://github.com/TimSiebert1/ADOLC.jl/issues/7 to track progress

control-toolbox / CTBase.jl

AD: change the backend transparently #25