Open jbcaillau opened 1 year ago
This is more a discussion than an issue, no? Should be transfered elsewhere?
It is an issue. Now in CTBase.jl
, FWIW
I am not convinced that this is an issue. It is more a wish :-)
OK, move it!
We should move from ForwardDiff
to AbstractDifferentiation
See also: FastDifferentiation.jl
Check also DifferentiationInterface.jl.
Friendly ping from the creator of DifferentiationInterface: I'm available to help you make the transition if you want me to :)
Hi @gdalle! This would be great. Thanks. I propose first to post here how we use AD. @jbcaillau and @PierreMartinon, please complete.
ForwardDiff.jl
package: function ctgradient(f::Function, x::ctNumber)
return ForwardDiff.derivative(x -> f(x), x)
end
function ctjacobian(f::Function, x::ctNumber)
return ForwardDiff.jacobian(x -> f(x[1]), [x])
end
CTBase.jl
package:CTFlows.jl
package:function rhs(h::AbstractHamiltonian)
function rhs!(dz::DCoTangent, z::CoTangent, v::Variable, t::Time)
n = size(z, 1) ÷ 2
foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)
dh = ctgradient(foo, z)
dz[1:n] = dh[n+1:2n]
dz[n+1:2n] = -dh[1:n]
end
return rhs!
end
ADNLPModels.jl
: # call NLP problem constructor
docp.nlp = ADNLPModel!(x -> DOCP_objective(x, docp),
x0,
docp.var_l, docp.var_u,
(c, x) -> DOCP_constraints!(c, x, docp),
docp.con_l, docp.con_u,
backend = :optimized)
Thanks for the links, I'll take a look but I already have a few questions.
Why do you call the derivative the gradient? What is this ctNumber that you use?
Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).
Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.
As for ADNLPModels, they are also considering a switch to DifferentiationInterface but it might be slightly slower
@gdalle Thanks for the PR and comments
Why do you call the derivative the gradient? What is this ctNumber that you use?
ctNumber = Real
. We want to deal more or less uniformly with reals and one dimensional vectors, that is why the special case when the variable is a single real is explicitly dealt with.
Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).
Dimensions < 1e2, e.g. to build the right hand side of a Hamiltonian system.
Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.
✅ to be tested elsewhere (see also this comment)
I think that for CTBase.jl
, a step has been done. Do we close this issue? We will see next in CTFlows.jl
how to handle this.
To me this is not yet done, because #141 added a backend
kwargs to ctgradient
and the like, but this kwarg is not passed from further up the chain. As a result, users cannot change the AD backend, even though package developers can through the __auto()
function.
To clarify, even with this PR, you're currently doing something like
function solve_control_problem(f)
# ...
for i in 1:n
x -= gradient(f, x)
end
# ...
end
function gradient(f, x, backend=default_backend())
# ...
end
And for users who only care about high-level interfaces, and who never call gradient
directly, the following seems better to me:
function solve_control_problem(f, backend)
# ...
for i in 1:n
x -= gradient(f, x, backend)
end
# ...
end
function gradient(f, x, backend=default_backend())
# ...
end
But you know best if that's relevant in your case or not.
We totally agree with you. The second choice is better.
But, actually the function to solve optimal control problems is not in the CTBase.jl
package.
Besides, our function
function gradient(f, x, backend=default_backend())
# ...
end
is not used in the resolution function of optimal control problems. It is used for instance in the package CTFlows.jl
here. I agree that here I will have to add a kwarg for the AD backend.
About the resolution of the optimal control problems, we pass through ADNLPModels.jl
and again we want the user to have the possibility to choose the AD backend.
@gdalle agreed, thanks for the feedback. actually, there is now a setter that allows user / dev to change the backend (globally and dynamically); it is also easy to add optional kwarg to allow this anywhere it makes sense (solvers, etc.) We leave this issue open for further testing, e.g. for cases requiring a change of backend between first order derivative computation and second order ones.
On a side note: check this upcoming talk at JuliaCon 2024 (we'll also be around)
Thanks for pointing out ADOLC.jl, we're already on the ball ;) see https://github.com/TimSiebert1/ADOLC.jl/issues/7 to track progress