JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.12k stars 217 forks source link

how to make optim work with Measuements #823

Closed hzgzh closed 3 years ago

hzgzh commented 4 years ago

Hi: I want to use some Measurements variable in optim problem. but have some error,how do I handle it,wish to get reply

using Measurements,Optim f(x)=(1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2 x=zeros(Measurement{Float64},2) optimize(f,x)

pkofod commented 4 years ago

I'm not sure what it would take. I'm sure a lot of type annotations and similars might be incorrect when units come into play. I'll keep it open, maybe we can do something.

giordano commented 4 years ago

At least part of the issue is that some type annotations are wrong: in https://github.com/JuliaNLSolvers/Optim.jl/blob/0794f443954d1a843fc786c1cea561f474a650e7/src/types.jl#L166-L184 the parameter T is the type of x_abstol, x_reltol, f_abstol, f_reltol, and g_abstol, but in https://github.com/JuliaNLSolvers/Optim.jl/blob/0794f443954d1a843fc786c1cea561f474a650e7/src/multivariate/optimize/optimize.jl#L98-L115 they all have type Tf while the explicit parameter passed to the constructor (why are the parameters passed at all?) is T

giordano commented 4 years ago

Replacing the type of all those fields of MultivariateOptimizationResults with Tf (and dropping the parameter T? It seems to be otherwise unused) actually makes the solver work, but then there is a StackOverflowError in the show method of the final objective value:

julia> optimize(f,x)
 * Status: success

 * Candidate solution
    Final objective value:     Error showing value of type Optim.MultivariateOptimizationResults{NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters},Measurement{Float64},Array{Measurement{Float64},1},Measurement{Float64},Measurement{Float64},Array{OptimizationState{Measurement{Float64},NelderMead{Optim.AffineSimplexer,Optim.AdaptiveParameters}},1},Bool}:
ERROR: StackOverflowError:
Stacktrace:
 [1] ini_dec(::Measurement{Float64}, ::Int64, ::Array{UInt8,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Printf/src/Printf.jl:1011 (repeats 79984 times)

I don't have much more time to track this issue further down, but I hope I gave you useful hints to fix it.

pkofod commented 4 years ago

Thanks @giordano might have a go at this.

Boxylmer commented 3 years ago

Is giving +1's a thing we do on GitHub? Having compatibility between Measurements.jl and Optim.jl would be incredible for having analytical derivatives in our research. Right now we're limited to numerical derivatives in out error propagation (autodiff causes the issues Mosè cited above). Few optimization packages allow for this.

antoine-levitt commented 3 years ago

What do you want to do exactly? Differentiating through the solver is not a good idea if what you want is the uncertainty in the output wrt data (ie the objective function, not the initial guess). You can get that easily using the hessian if you have it available. If you are interested in the uncertainty in the minimum objective value, it's even simpler.

Boxylmer commented 3 years ago

Maybe I'm not quite understanding the easier solution. Using three variables P, V and T*, I have a direct solution for P using V and T, but I'm interested in the case of solving V using P and T, so I have to guess V and was interested in knowing both its value and uncertainty using an optimizer solving objective = (P_calculated(V) - P_known)^2.

Currently, I accomplished this in two optimizations rather than one:

  1. I stripped all the uncertainties from the inputs and solved the value normally (no measurement types), then
  2. I use a second objective function that minimizes the uncertainties of P instead, with this new objective function not using automatic differentiation. I.e., We use the solved V value to find the V uncertainty that, when calculated in the direct P(V, T) expression, will make the calculated uncertainty in P correct.

I should clarify I'm not a programmer by profession and am quite new to Julia at that, so I may not be using the right terminology here.

*There are many more inputs than P, V, and T, but these are the three we're most interested in and best used for an example.

longemen3000 commented 3 years ago

ok, i made optim work with measurements, it seems. the error faced by @giordano is due to a problem to pass a measurement to sprintf macros used at the visualization code of MultivariateOptimizationResults. but on normal numbers it works without any problem, so it seems ok to just PR the fix?

antoine-levitt commented 3 years ago

I don't understand why you even do that with Optim. You have an EOS f(P,V,T)=0 and want to find P as a function of V and T? Why don't you just use a rootfinding algorithm? (eg Roots or NLSolve) And then if you want uncertainties you just differentiate: df/dP dP + df/dV dV + df/dT dT = 0, solve for dP.

Boxylmer commented 3 years ago

@antoine-levitt I should clarify a bit more that in the way I was using Optim, it wasn't ever seeing measurement types or doing differentiation for the purpose of finding uncertainty directly (just for optimizing a black box in Optim's perspective). We did a single solution to find the value of a parameter, then another optimization to find the value of the variance that would retrieve the value of the known variance of the target (the target function in that second case is the black box, and AD is not used).

I definitely am not aware enough of the differences between Optim and NLSolve/Roots to fully understand the advantages of one over another, but I took the naïve approach of Optim as I believed it was the more general option and I plan on applying the same technique to more complex equations of state than the one I started with (Peng-Robinson). Further, I don't use variance in just P, V and T. Parameters that are specific to the EOS have uncertainty as well, such as in my case: critical parameters (and acentric factors), fittings to interaction parameters (which is a matrix), mole fractions in multicomponent systems, etc. The sum of partials you referenced above might have more terms and not necessarily the same for each EOS used.

I'm always up for learning about why it might be a better option though. I spoke with @longemen3000 a bit about the theory of using the sum of partials to get the uncertainty and he was incredibly helpful in understanding it. If you have any links/recommendations for learning about optimization further I'd love to check them out!

I'm interested in understanding you in your previous suggestion though: What I still didn't follow completely is that, in the case that P, V, T are the only variables with uncertainty, and if we have f(P, V, T)=0, then...

To me, it isn't immediately obvious if this is trivial or not. It can't be zero, since all the terms in the variance formula are squared and would then also have to be zero. How is it found?

I also understand we've deviated a bit from the topic at hand, so feel no obligation! I'm also always on the Humans of Julia Discord --> #Chemistry channel.

antoine-levitt commented 3 years ago

I think the concept you're missing is that of implicit definition of a function, aka https://en.wikipedia.org/wiki/Implicit_function_theorem. Essentially, f(P,V,T) = 0 defines (locally) a function P = g(V,T). Then you can find the partial derivatives of g wrt V and T as eg dg/dV = -df/dV / (df/dP). Then, if the random variable V has a small variance, the variance of g(V) can be approximated by linearization as (dg/dV)^2 sigma(V).

There are two main types of problems you can solve numerically, either optimization min_x f(x) with x in R^n, or root-finding f(x)=0 where x is in R^n and f : R^n -> R^n. The problem you have looks to me as belonging to the second rather than the first category.

Boxylmer commented 3 years ago

Definitely am lacking on the former, I'm giving it some reading!

For the latter, I see now what you mean. I tried NLSolve as a drop in just now, and in order to get the same accuracy as before (and doing a little bit of fine tuning), it takes about an order of magnitude more time than Optim. Neglecting the root finding vs optimization issue, is there any reason I shouldn't be using an optimization library as compared to root finding? I'm getting much better out-of-the-box performance with Optim without any performance tweaking,

e.g., in my unit tests, here is some performance comparisons with target being the previous "max value" using Optim. Time (μs) to calculate multi-component PengRobinson volume: 2007.0 (target: 300) # <-- this is the only thing that would change based on solver choice. Time (μs) to calculate single-component PengRobinson pressure: 90.99999999999999 (target: 100.0) Time (μs) to calculate multi-component PengRobinson pressure: 92.0 (target: 100.0)

antoine-levitt commented 3 years ago

I would imagine that either your code is suboptimal, or that the overhead of the solver for such a small problem is significant. If you just have one equation and one unknown, use Roots.jl.

Every root finding problem f(x)=0 can be set in the form minimize |f(x)|^2. The problem is that you might find spurious local minima of |f(x)| that are not roots, and also that the methods are suboptimal.

Boxylmer commented 3 years ago

@antoine-levitt I'll give Roots a shot! I figured that the fact that you know you're looking for a root rather than local minima will allow you to optimize in some fashion, but I wasn't sure specifically what those optimizations would be, and cursory googling didn't reveal a whole lot.

I'm willing to bet that there's some suboptimal code in my project, haha! Thanks so much for your help in this, by the way.