JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.12k stars 221 forks source link

Allow Dual Numbers #248

Closed matthieugomez closed 4 years ago

matthieugomez commented 8 years ago

I'd like optimize to work with dual numbers. This would be handy when optimizing a function that optimizes something. Would it be difficult to write all the algorithms in a way that work with dual numbers? At least for the case of derivative free univariate optimization, it seems like the only reason it does not work now is more type assertions than necessary.

KristofferC commented 8 years ago

You need to be careful when differentiating iterative processes because the relationship between converging the residual and the derivative is not necessarily clear. For Newton methods I believe it should be ok though.

matthieugomez commented 8 years ago

Could we make a list of methods that should be ok and try to make them work? @jrevels @lbenet

mlubin commented 8 years ago

It would be nice if this worked, but note there are better ways to differentiate through optimization: http://link.springer.com/chapter/10.1007%2F978-3-540-68942-3_7

pkofod commented 8 years ago

Could we make a list of methods that should be ok and try to make them work? @jrevels @lbenet

I've never had to do something like this, but I get why it would be neat. Can you post an example you would wish worked?

The only problem I can see is that we use stuff like explicit axpy! and other BLAS calls, often with a substantial performance gain. Some of this is can be dispatched away I guess, but this is not at the top of my personal todo-list. A PR is of course more than welcome.

matthieugomez commented 8 years ago

@pkofod Two typical cases:

But, looking at the univariate code for, say, GoldenSection, I don't actually see how it would work with automatic differentiation. How can one differentiate something like a binary search through automatic differenciation? The derivative of the minimizer is going to be 0.0, right?

anriseth commented 8 years ago

@matthieugomez for the HJB equations, have you considered using the methods described in https://cs.uwaterloo.ca/~paforsyt/hjb.pdf and https://arxiv.org/abs/1503.05864 ? For example, the Piecewise Constant Policy Timestepping reduces the HJB equations to a system of linear PDEs that can be solved in parallel.

matthieugomez commented 8 years ago

@anriseth thanks that's pretty cool! Did you have any experience with this? I have a public repository to work on solving HJBs in Julia (https://github.com/matthieugomez/HJBFiniteDifference.jl). I'm interested in pull requests in case you have some code.

anriseth commented 8 years ago

(Sorry for going off-topic, we can continue this elsewhere) @matthieugomez It worked very well for the one optimal control problem I've tried to solve. It might be troublesome for high-dimensional controls, as the number of linear solves explodes. I was planning to set up a HJB-solver myself, but haven't got any code yet. Looking at HJBFiniteDifference.jl, it seems to be a bit more specific to certain model equations? I was thinking of writing something more general.

pkofod commented 7 years ago

I am moving towards removing unnecessary type assertions/annotations but this is sort of a "won't fix, unless it starts working on its own". Please do open an issue or better yet a PR if you're interested in helping out. There are many things that would be nice to add, but it's unlikely that this will happen on purpose unless someone who needs it does the work.

anriseth commented 7 years ago

DifferentialEquations.jl has made this thing work for their ODE solvers (AD wrt parameters of the ode). If anyone is interested in sorting this out for Optim, I believe @ChrisRackauckas can provide some pointers to get them started.

On Sat, 21 Jan 2017, 10:31 Patrick Kofod Mogensen, notifications@github.com wrote:

Closed #248 https://github.com/JuliaOpt/Optim.jl/issues/248.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaOpt/Optim.jl/issues/248#event-931649132, or mute the thread https://github.com/notifications/unsubscribe-auth/AFV6DV-yxF0C03Bkqj109_RXnnWuCdyYks5rUd6VgaJpZM4JOXFV .

ChrisRackauckas commented 7 years ago

Yes, I got it working by opening up the type annotations as wide as possible (usually just letting them roam as Any) on functions, parameterizing each type with free parameters, carefully tuning which parameters can match, and avoiding any libraries which call out to C/FORTRAN. While this seemed to come naturally to the *DiffEq solvers, it's because of a huge load of type making everything type-generic enough to allow Unitful.jl numbers, so don't think it was quick or easy to get working.

And the end result only works for non-stiff solvers because of that last point:

avoiding any libraries which call out to C/FORTRAN

It needs to avoid linear algebra which uses BLAS and instead make sure it knows to use a fallback. I don't think it does that naturally. So if you try it with an implicit method like Trapezoid() it will likely fail inside the call to NLsolve, and if you try it with the Rosenbrock23() method it will likely fail in \. This can probably be remedied using GenericSVD.jl or LinearAlgebra.jl and forcing it to call the fallbacks? But the first step here is clearly to make sure there is a linear algebra library which is compatible with dual numbers, which is a project itself.

After that, I am not sure about the details on whether the fact that the Jacobian is generated using autodifferentiation as well will end up being a problem, or if that could cause perturbation confusion. The DiffEq-ecosystem can mostly shield itself from that problem because it can use symbolic differentiation in many cases (and it's more performant there than autodifferentiation since the function changes every iteration). However, autodifferentiation is clearly advantages in optimization and so I'm not sure how this is handled, and I would definitely test the results when you get it working to make sure this type of perturbation confusion isn't happening.

pkofod commented 7 years ago

Alright, alright... :) Reopening, and I will try to see what is needed to accommodate this.

ranjanan commented 7 years ago

@pkofod I made the following simple example work:

 using Optim
 import ForwardDiff:Dual

 f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

 b = Dual(1.,1.)

 optimize(f, [b,b], LBFGS())

by modifying the OptimizationState type to

 immutable OptimizationState{T <: Optimizer,S,V}
     iteration::S
     value::V
     g_norm::V
     metadata::Dict
 end
 @compat OptimizationTrace{T,S,V} = Vector{OptimizationState{T,S,V}}

adding this:

 Base.nextfloat{N,T}(d::Dual{N,T}) = Dual(nextfloat(d.value), nextfloat.(d.partials)...)

and changing the update! function in src/utilities to incorporate these.

This was the result I got:

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [Dual(1.0,1.0),Dual(1.0,1.0)]
 * Minimizer: [Dual(1.0,0.9999992336039727),Dual(1.0,1.000000756934091), ...]
 * Minimum: Dual(0.0,-0.0)
 * Iterations: 1
 * Convergence: true
   * |x - x'| < 1.0e-32: true
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: true
   * |g(x)| < 1.0e-08: false
   * f(x) > f(x'): false
   * Reached Maximum Number of Iterations: false
 * Objective Function Calls: 76
 * Gradient Calls: 76

I wanted to clarify this: Does this code mean I'm calculating the derivative of all these quantities with respect to (in this case) the starting point?

Nosferican commented 5 years ago

Anything else needed here?

pkofod commented 5 years ago

In the cases where this might work (it generally doesn't) we'd need to verify that the result is correct. I think thenextfloat part came from the convergence check where it's no longer used, so it may work now. Would be cool with some examples we could work with.