Closed matthieugomez closed 4 years ago
You need to be careful when differentiating iterative processes because the relationship between converging the residual and the derivative is not necessarily clear. For Newton methods I believe it should be ok though.
Could we make a list of methods that should be ok and try to make them work? @jrevels @lbenet
It would be nice if this worked, but note there are better ways to differentiate through optimization: http://link.springer.com/chapter/10.1007%2F978-3-540-68942-3_7
Could we make a list of methods that should be ok and try to make them work? @jrevels @lbenet
I've never had to do something like this, but I get why it would be neat. Can you post an example you would wish worked?
The only problem I can see is that we use stuff like explicit axpy!
and other BLAS calls, often with a substantial performance gain. Some of this is can be dispatched away I guess, but this is not at the top of my personal todo-list. A PR is of course more than welcome.
@pkofod Two typical cases:
V
such that f(V, V_x, V_xx, Optim.minimizer(k -> g(k, V_x)) = 0.0
. I'd like to use NLsolve for the outer problem and optimize for the inner optimization problem.But, looking at the univariate code for, say, GoldenSection, I don't actually see how it would work with automatic differentiation. How can one differentiate something like a binary search through automatic differenciation? The derivative of the minimizer is going to be 0.0, right?
@matthieugomez for the HJB equations, have you considered using the methods described in https://cs.uwaterloo.ca/~paforsyt/hjb.pdf and https://arxiv.org/abs/1503.05864 ? For example, the Piecewise Constant Policy Timestepping reduces the HJB equations to a system of linear PDEs that can be solved in parallel.
@anriseth thanks that's pretty cool! Did you have any experience with this? I have a public repository to work on solving HJBs in Julia (https://github.com/matthieugomez/HJBFiniteDifference.jl). I'm interested in pull requests in case you have some code.
(Sorry for going off-topic, we can continue this elsewhere) @matthieugomez It worked very well for the one optimal control problem I've tried to solve. It might be troublesome for high-dimensional controls, as the number of linear solves explodes. I was planning to set up a HJB-solver myself, but haven't got any code yet. Looking at HJBFiniteDifference.jl, it seems to be a bit more specific to certain model equations? I was thinking of writing something more general.
I am moving towards removing unnecessary type assertions/annotations but this is sort of a "won't fix, unless it starts working on its own". Please do open an issue or better yet a PR if you're interested in helping out. There are many things that would be nice to add, but it's unlikely that this will happen on purpose unless someone who needs it does the work.
DifferentialEquations.jl has made this thing work for their ODE solvers (AD wrt parameters of the ode). If anyone is interested in sorting this out for Optim, I believe @ChrisRackauckas can provide some pointers to get them started.
On Sat, 21 Jan 2017, 10:31 Patrick Kofod Mogensen, notifications@github.com wrote:
Closed #248 https://github.com/JuliaOpt/Optim.jl/issues/248.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaOpt/Optim.jl/issues/248#event-931649132, or mute the thread https://github.com/notifications/unsubscribe-auth/AFV6DV-yxF0C03Bkqj109_RXnnWuCdyYks5rUd6VgaJpZM4JOXFV .
Yes, I got it working by opening up the type annotations as wide as possible (usually just letting them roam as Any
) on functions, parameterizing each type with free parameters, carefully tuning which parameters can match, and avoiding any libraries which call out to C/FORTRAN. While this seemed to come naturally to the *DiffEq
solvers, it's because of a huge load of type making everything type-generic enough to allow Unitful.jl numbers, so don't think it was quick or easy to get working.
And the end result only works for non-stiff solvers because of that last point:
avoiding any libraries which call out to C/FORTRAN
It needs to avoid linear algebra which uses BLAS and instead make sure it knows to use a fallback. I don't think it does that naturally. So if you try it with an implicit method like Trapezoid()
it will likely fail inside the call to NLsolve, and if you try it with the Rosenbrock23()
method it will likely fail in \
. This can probably be remedied using GenericSVD.jl or LinearAlgebra.jl and forcing it to call the fallbacks? But the first step here is clearly to make sure there is a linear algebra library which is compatible with dual numbers, which is a project itself.
After that, I am not sure about the details on whether the fact that the Jacobian is generated using autodifferentiation as well will end up being a problem, or if that could cause perturbation confusion. The DiffEq-ecosystem can mostly shield itself from that problem because it can use symbolic differentiation in many cases (and it's more performant there than autodifferentiation since the function changes every iteration). However, autodifferentiation is clearly advantages in optimization and so I'm not sure how this is handled, and I would definitely test the results when you get it working to make sure this type of perturbation confusion isn't happening.
Alright, alright... :) Reopening, and I will try to see what is needed to accommodate this.
@pkofod I made the following simple example work:
using Optim
import ForwardDiff:Dual
f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
b = Dual(1.,1.)
optimize(f, [b,b], LBFGS())
by modifying the OptimizationState
type to
immutable OptimizationState{T <: Optimizer,S,V}
iteration::S
value::V
g_norm::V
metadata::Dict
end
@compat OptimizationTrace{T,S,V} = Vector{OptimizationState{T,S,V}}
adding this:
Base.nextfloat{N,T}(d::Dual{N,T}) = Dual(nextfloat(d.value), nextfloat.(d.partials)...)
and changing the update!
function in src/utilities to incorporate these.
This was the result I got:
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [Dual(1.0,1.0),Dual(1.0,1.0)]
* Minimizer: [Dual(1.0,0.9999992336039727),Dual(1.0,1.000000756934091), ...]
* Minimum: Dual(0.0,-0.0)
* Iterations: 1
* Convergence: true
* |x - x'| < 1.0e-32: true
* |f(x) - f(x')| / |f(x)| < 1.0e-32: true
* |g(x)| < 1.0e-08: false
* f(x) > f(x'): false
* Reached Maximum Number of Iterations: false
* Objective Function Calls: 76
* Gradient Calls: 76
I wanted to clarify this: Does this code mean I'm calculating the derivative of all these quantities with respect to (in this case) the starting point?
Anything else needed here?
In the cases where this might work (it generally doesn't) we'd need to verify that the result is correct. I think thenextfloat
part came from the convergence check where it's no longer used, so it may work now. Would be cool with some examples we could work with.
I'd like
optimize
to work with dual numbers. This would be handy when optimizing a function that optimizes something. Would it be difficult to write all the algorithms in a way that work with dual numbers? At least for the case of derivative free univariate optimization, it seems like the only reason it does not work now is more type assertions than necessary.