SciML / DiffEqParamEstim.jl

Easy scientific machine learning (SciML) parameter estimation with pre-built loss functions
https://docs.sciml.ai/DiffEqParamEstim/stable/
Other
61 stars 34 forks source link

Parameter estimation: the build_loss_objective #5

Closed finmod closed 7 years ago

finmod commented 7 years ago

The build_loss_objective is not focused enough from both the math and the user point of view. First "build" should refer to the general class of problems being solved here: the NL objective function for inference, calibration or curve fitting. My suggestion is that it should cover Likelihood optimization (NLLik) and least square optimization (NLLS). This is the argmin f(x) or argmax f(x) term. The extension is that beyond Likelihood and Least Squares, this can be extended to Bayesian as a third method.

The "loss" function is too generic and should be replace by a family of data misfit function/measure. This can be of the form outlined in http://ceres-solver.org/nnls_solving.html#covariance-estimation. In case the covariance function is ignored, the same data misfit measure falls back to a least square problem with no modification to overall function. Here I have some difficulties understanding LossFunctions.jl, how different it is to Distance.jl and what is best for the problem on hand. This is an important methodological point because the data misfit term has vocation to upscale to include regularization considerations as in Mads.jl or likelihood=informed parameter and state reduction.

"Objective" could be replaced by the solver being used: Optim, NLopt, Ipopt. One present issue is that many servers use the "optimize!" syntax so that you cannot run several solvers on the same notebook without restarting the kernel. Instead, after defining the objective function, the use of "optimize!" in the next line should know which solver should be used at this specific point.

Finally, there is an inconsistency problem on the Initial value or boundary conditions and where they are defined. From the maths point of view, these set of conditions belong to the model description created by parameterized function. Parameterized.function should check that there are no redundancies or overdeterminations. Then, in the optimization, the arg part, would know exactly what to solve.

ChrisRackauckas commented 7 years ago

First "build" should refer to the general class of problems being solved here: the NL objective function for inference, calibration or curve fitting. My suggestion is that it should cover Likelihood optimization (NLLik) and least square optimization (NLLS). This is the argmin f(x) or argmax f(x) term.

Instead, I should just generalize it to allow you to optimize any function of the solution, with the L2 loss against some data as the default. I need some better way of doing this.

The "loss" function is too generic and should be replace by a family of data misfit function/measure. This can be of the form outlined in http://ceres-solver.org/nnls_solving.html#covariance-estimation. In case the covariance function is ignored, the same data misfit measure falls back to a least square problem with no modification to overall function. Here I have some difficulties understanding LossFunctions.jl, how different it is to Distance.jl and what is best for the problem on hand. This is an important methodological point because the data misfit term has vocation to upscale to include regularization considerations as in Mads.jl or likelihood=informed parameter and state reduction.

I'm not sure what you mean here.

"Objective" could be replaced by the solver being used: Optim, NLopt, Ipopt. One present issue is that many servers use the "optimize!" syntax so that you cannot run several solvers on the same notebook without restarting the kernel. Instead, after defining the objective function, the use of "optimize!" in the next line should know which solver should be used at this specific point.

This is for the optimization packages to coordinate. I believe they already do so with the MathProgBase interface. Just Optim isn't compliant with that.

Finally, there is an inconsistency problem on the Initial value or boundary conditions and where they are defined. From the maths point of view, these set of conditions belong to the model description created by parameterized function. Parameterized.function should check that there are no redundancies or overdeterminations. Then, in the optimization, the arg part, would know exactly what to solve.

You mean have an option to also solve for the initial conditions? That shouldn't be hard to implement.

finmod commented 7 years ago

Returning to the "build" issue, my suggestion is to label this as: PE or ParEst or PmEst as a function to find the unknown parameters of a proposed model so as to minimize a given measure of the distance among the model predictions and the experimental data.

A clear vision of this is provided here: https://sites.google.com/site/amigo2toolbox/home , JuliaDiffEq should be the Julia mirror of this Matlab/Sundials project.

Returning to the "loss" issue, the distance measure between the experimental data and the model predictions should cover the two broad families of generalized least squares and maximum log-likelihood functions. Correspondingly, the distance measure should be able to handle weights taken as the inverse of the covariance of the experimental data. At present, it is unclear what L2Dist of LossFunctions.jl can handle and how different it from Distances.jl

Finally, the regularization issue may either be included in the ParEst function or built separately as ParEstReg with various options: Tikhonov for least square (as in Mads) or likelihood-based. The AMIGO2 theoretical doc covers this aspect well. AMIGO2R2016b is also available on github: https://github.com/csynbiosys/AMIGO2R2016b

ChrisRackauckas commented 7 years ago

Returning to the "build" issue, my suggestion is to label this as: PE or ParEst or PmEst as a function to find the unknown parameters of a proposed model so as to minimize a given measure of the distance among the model predictions and the experimental data.

I'm still not sure what you're getting at here.

A clear vision of this is provided here: https://sites.google.com/site/amigo2toolbox/home , JuliaDiffEq should be the Julia mirror of this Matlab/Sundials project.

Thanks for the link. All of that functionality is stuff I would like to have as part of JuliaDiffEq.

Returning to the "loss" issue, the distance measure between the experimental data and the model predictions should cover the two broad families of generalized least squares and maximum log-likelihood functions. Correspondingly, the distance measure should be able to handle weights taken as the inverse of the covariance of the experimental data. At present, it is unclear what L2Dist of LossFunctions.jl can handle and how different it from Distances.jl

Finally, the regularization issue may either be included in the ParEst function or built separately as ParEstReg with various options: Tikhonov for least square (as in Mads) or likelihood-based. The AMIGO2 theoretical doc covers this aspect well. AMIGO2R2016b is also available on github: https://github.com/csynbiosys/AMIGO2R2016b

I think the solution here is to get rid of all hard-coding, and make the least-square path just a default. Instead of just being able to choose a loss function, instead take from the user a cost calculation c(sol,t,data) which is "how you would calculate the cost, knowing the diffeq solution". The default would be for c(sol,t,data) to calculate the least-square loss against the data at each timepoint t. But then a user could use this define whatever else they want, add regularization, use a likelihood loss, etc. build_loss_objective would then just require the user to specify what they want the cost function to be, and then spit out the objective function which a user could use with whatever optimization package.

Once that's in place, then we'd just need to make it easier to specify likelihood-based losses. That would just be pre-defining some functions which the user could pass as the argument for c. That should be fully general but still easy to use, right?

finmod commented 7 years ago

Let's restart this issue because I am losing the thread. I would also like to report that parameter estimation for the Lorenz example is going absolutely nowhere beyond trivial examples with the set of functions we have at present. This is due to the problem and the cost function being ill-posed.

The two functions that are required for parameter estimation are: a cost function and a parameter estimation function describing equation 5 and 6 of this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625902/ .

The PE function (6) without regularization becomes PE function (7) with regularization. This presentation, also known as calibration, is standard and identical to the one in Mads. The PE function performs parameter estimation of an unknown model and it deals with the various combination of solvers. One example of PE function is here: https://github.com/csynbiosys/AMIGO2R2016b/blob/master/AMIGO_PE.m and AMIGO_REG_PE.m in the same folder, unfortunately in Matlab. The menu of regularization terms can include: Tikhonov, Tikhonov-flux or other user defined function like likelihood inspired regularization.

The PEcost function computes the cost function to be minimized for parameter estimation, i.e. provides a measure of the distance among experimental data and model predictions. Several cases of equation 5 emerge: LSQ, LLK and DE. Flexibility in the computation of the error_matrix for the different cases of residual vector R() is required. This is where I have an issue with what LossFunctions.jl does. There are many outcomes for Qls = TranspR R. For LSQ, the weight matrix could take the form: Q-mat, Q-matobs, O-I (the simplest), Q-espmax, Q-exmean. For LLK, the weights are homoskedastic and heteroskedastic variances/covariances. An example of what what PEcost function should accommodate is https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Kernel/AMIGO_PEcost.m.

Finally, the parameter bounds and initial conditions should move up to be in the general definition of the model being estimated and not in the PE function.

This general setup also accommodates extensions to the Bayesian approach or testing other solvers. It goes from the general to the particular where you can see that LSQ is just a simple case of parameter estimation and a not very interesting at that because it runs quickly into difficulties when the complexity of the estimated model increases.

ChrisRackauckas commented 7 years ago

The PE function (6) without regularization becomes PE function (7) with regularization. This presentation, also known as calibration, is standard and identical to the one in Mads. The PE function performs parameter estimation of an unknown model and it deals with the various combination of solvers. One example of PE function is here: https://github.com/csynbiosys/AMIGO2R2016b/blob/master/AMIGO_PE.m and AMIGO_REG_PE.m in the same folder, unfortunately in Matlab. The menu of regularization terms can include: Tikhonov, Tikhonov-flux or other user defined function like likelihood inspired regularization.

This is the function I don't want to write, because then I would have to wrap outside routines and it would only decrease the generality of the function. If I instead just generate an objective function, then the user can take that to any optimization package, and all of those approaches are then possible.

The PEcost function computes the cost function to be minimized for parameter estimation, i.e. provides a measure of the distance among experimental data and model predictions. Several cases of equation 5 emerge: LSQ, LLK and DE. Flexibility in the computation of the error_matrix for the different cases of residual vector R() is required. This is where I have an issue with what LossFunctions.jl does. There are many outcomes for Qls = TranspR R. For LSQ, the weight matrix could take the form: Q-mat, Q-matobs, O-I (the simplest), Q-espmax, Q-exmean. For LLK, the weights are homoskedastic and heteroskedastic variances/covariances. An example of what what PEcost function should accommodate is https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Kernel/AMIGO_PEcost.m.

Yes, this is all stuff you'll be able to do when generalizing the cost function as above.

This general setup also accommodates extensions to the Bayesian approach or testing other solvers. It goes from the general to the particular where you can see that LSQ is just a simple case of parameter estimation and a not very interesting at that because it runs quickly into difficulties when the complexity of the estimated model increases.

I don't want to mix the Bayesian approaches into the same function because that I would be stuck making some large overarching parameter estimation function, which gets rid of the power that you get from just making an objective function in the other cases. I think a separate function for Bayesian estimation is fine, and could cover the rest of it. However, it's hard to know without some implementation.

ChrisRackauckas commented 7 years ago

I would also like to report that parameter estimation for the Lorenz example is going absolutely nowhere beyond trivial examples with the set of functions we have at present.

Yes, I would think that the set of methods that we get for free from the objective function approach are very broad and will do well on many problems (using things like the global optimizers from NLopt), but they will not handle chaotic problems gracefully. To handle chaotic problems, one could change the cost function to do a two-stage weighing scheme, which means we just need to generalize the cost function as I discussed above and then this could be implemented by the user to get better results (according to that one video you sent me awhile ago). The other way to handle that is Bayesian schemes, but I am punting that to "this sounds like a good project for someone else", and is a great Google Summer of Code project.

The last thing is that, Mads has a lot of these things, so I think developing a linker package so that way it's super easy to use the functionality from Mads on a DifferentialEquations diffeq with whatever solver algorithm you want is ultimately what will give the best methods. And then that will give a set of methods which naturally improves itself overtime: as Mads gets more functionality and DifferentialEquations gets more functionality, this whole linkage would naturally get more functionality with both of the developers doing little to no extra work.

finmod commented 7 years ago

All of this is entirely correct as shown in the paper Lin, Y, O'Malley, D., Vesselinov, V.V., A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses, Water Resources Research, doi: 10.1002/2016WR019028, 2016. PDF. It is just that Mads has been driving me mad with setup and testing issues on Windows, WSL and Ubuntu 16.04. Do you have a foolproof setup that passes the standard Pkg.test("Mads")? And immediately after that do you know of an ODE example of the Lorenz, FH, Lotka-Volterra that could provide a better test of MADS than the one equation example given in the examples/ode folder. Thank you.

ChrisRackauckas commented 7 years ago

It is just that Mads has been driving me mad with setup and testing issues on Windows, WSL and Ubuntu 16.04. Do you have a foolproof setup that passes the standard Pkg.test("Mads")? And immediately after that do you know of an ODE example of the Lorenz, FH, Lotka-Volterra that could provide a better test of MADS than the one equation example given in the examples/ode folder. Thank you.

No. I do not find Mads easy to navigate. Part of what I will be doing when I make the linker is make this easier, and write some tutorials about how to use all of it.

Ayush-iitkgp commented 7 years ago

@ChrisRackauckas and @finmod thanks for such an insightful discussion. @finmod I would like to invite you to use the Two-Stage method, I believe it will solve the problem you have with Lorenz example(do report the results, I would be excited to know it). From the AMIGO example directory, it looks like they have implemented different methods to estimate the parameters based on the research papers for the different set of ODEs. I am not sure if it would be a great way. Yes, adding the regularization term to the cost function is a great idea and something I would be working towards. As far as the research paper you have mentioned, they seem to have provided a general approach to solving inverse problem and justified the use of regularization and global optimization techniques. I believe this all can be implemented in DiffEqParamEstim by modifying our cost function to be in the acceptable format for different Global Optimization packages that are present in Julia. Most part of the paper is targeted to the audience who want to use the parameter estimation packages and decide upon the regularization techniques they want to use depending upon the prior information they have about the model. Let me know if I am going wrong somewhere :)

ChrisRackauckas commented 7 years ago

For loss functions, we may want to look into using this:

https://github.com/JuliaML/PenaltyFunctions.jl

But I think generalizing the cost function even more would be a great idea. Right now they are forced to use some loss function. We can go one step up to incorporate regularization (or LossFunctions.jl might become compatible with PenaltyFunctions.jl, so there may be no need on our end... that's probably the better design anyways, https://github.com/JuliaML/LossFunctions.jl/issues/81).

But, there's an entirely different route as well. Fitting to data is just one type of parameter estimation problem. Another type is parameter estimation on more phenomenological grounds. Best to give an example. Say you are using your models to find out how to best eliminate a cancerous tumor. You can use DifferentialEquations with event handling to solve your model and have it stop solving the moment that the number of tumor cells is zero (using terminate!(integrator)). In this case, what you want to minimize is sol.t[end]: how long it took. You want to find parameters that minimize this value of the solution, not necessarily some cost vs data.

There are many scenarios like this. More generally, you can write it as minimizing f(sol) w.r.t some parameter set. So instead of confining our methods to some loss, we should probably make it possible to make the loss any function of the solution. But there should be some easy API for defining the loss against data, since that's a common problem as well. A good API solution to this would be extremely useful for many different parameter estimation methods because then they can all be re-written to allow for this kind of generality. (Note: the Two-Stage method cannot though: it's a method only for finding parameters to fit data).

Ayush-iitkgp commented 7 years ago

@ChrisRackauckas would you mind citing a good source for the second type of the parameter estimation problem you have discussed above? Well, I never thought in that direction neither did I find any literature regarding it. Thanks!

Ayush-iitkgp commented 7 years ago

PenaltyFunctions are not yet registered in METADATA. Would it be a good idea to use an unregistered package as the dependency?

ChrisRackauckas commented 7 years ago

PenaltyFunctions are not yet registered in METADATA. Would it be a good idea to use an unregistered package as the dependency?

It's probably a good idea to avoid this for now. It's an idea for the future though, or a user can do this themselves if they can use an f(sol). In fact, the data one can just be built from the f(sol) one, defining a specific f.

@ChrisRackauckas would you mind citing a good source for the second type of the parameter estimation problem you have discussed above? Well, I never thought in that direction neither did I find any literature regarding it. Thanks!

What I mentioned above is common for estimating parameters which solve a mean first-passage problem. Replace the ODE with some kind of stochastic equation: you want to find parameters which minimize the first passage time on average.

This actually brings up another point. We should make separate dispatches of the for different types of diffeqs. DAEs and DDEs should be able to use all the same methods as ODEs pretty easily. Stochastic equations (SDEs and jump equations) won't be able to use the two-stage method, but could use a build_loss_objective where instead of solving once, it uses the Monte Carlo stuff (http://docs.juliadiffeq.org/latest/features/monte_carlo.html) to optimize some function of an array of solutions (for example, optimize the mean's distance from data). All of these aren't far off and would be the first of their kind I think!

But there are more examples out there. Here's a paper which recently made waves:

http://journals.aps.org/prx/abstract/10.1103/PhysRevX.5.031036

This paper actually gets rid of the SDE in the estimation using Freidlin-Wentzell to get a Markov Chain approximation which is easier to estimate, but it's the same idea. They are looking to find parameters which lead to the least transitions to cancerous cells, and the parameters which are low in this case are thus the drug targets. Their cost function can be written in diffeq land as a f(sol) to minimize.

So since the data minimization problem can also be written as a f(sol) to minimize, it's a strict generalization where we can keep the default behavior the same, but open it up to allow people to do this kind of cool stuff. I don't think this area is fully explored (that paper is a big breakthrough and it's from Sept 2015!) and I definitely have my own applications / research projects in mind which make use of it.

finmod commented 7 years ago

@ChrisRackauckas and @Ayush-iitkgp . This quote: "I believe this all can be implemented in DiffEqParamEstim by modifying our cost function to be in the acceptable format for different Global Optimization packages that are present in Julia." was the actual starting point of this issue and has been my quest all along.

The definition of the NLP problem is further confirmed in this paper: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1452-4 by the same group as the AMIGO2 package. It is clear to me that the build_loss_objective should have a generalized least square and a likelihood function of a discrepancy, which I think is this L2Dist function. This discrepancy should be a general concept, as I do not know exactly what is behind L2Dist, capable of describing regression and classification problems according to the metric/norm chosen. @ChrisRackauckas biological example above is a classidication problem usually solved with a likelihood function on Huber distance, whereas a regression problem is solved on the basis of a quadratic distance (Euclidian, Hellinger, Bhattacharyya and more). In case, my terminology is confusing you may want to look here: https://github.com/robertfeldt/BlackBoxOptim.jl/blob/master/examples/regression_via_optimization.jl for an idea of the flexibility one should find in parameter estimation. Unfortunately, Robert has promised some time ago to address the issues that you can read in BlackBoxOptim but to no avail so far.

Concerning the regularization term of the objective fundtion, the earlier mentioned paper "Robust and efficient parameter estimation..." is fine. It advocates the Generalized Cross Validation method as a Penalty function within a menu of five methods.

Just a precision of about comments on the Two-Stage Method, I shy away from data smotthing technique because it does no add any statistical information to the parameter estimation problem. Instead the central question is what is the model that fits the data. On this score, likelihood function (minimum variance) is much more powerful than LS (minimum distance).

To return to the Lorenz estimation problem, let me report that it is a well-known NP-hard problem. So far any solution proposed in JuliaDiffEq fails because it is a local optimization and a least square method. Concerning a comment about initial starting values, any system can always be solved at t=0 or use the values of the data at t=0. The only success in estimating the parameters of the Lorenz system appears to be with Global optimization algorithms like DE but I would not dispair trying likelihhod. Again BlackBoxOptim would allow us to test this in Julia but unfortunately the Pkg.test("BlackBoxOptim") and examples fail in Julia 0.5.

When will we be able to demo the parameter estimation capabilities of JuliaDiffEq on the Lorenz system?

ChrisRackauckas commented 7 years ago

When will we be able to demo the parameter estimation capabilities of JuliaDiffEq on the Lorenz system?

I'm not sure. And it is not one of my focuses. I am more interested in systems biological and biomedical applications, and in these chaotic systems are rare. The best way to ensure that we address the directions you are interested in is to start contributing and I would like to help you get started. But generally if some of the methods are robust enough, then there should be some that work.

Does using the global optimization methods not work with Lorenz? I haven't tried it, but I would give it a try at least.

finmod commented 7 years ago

Does using the global optimization methods not work with Lorenz? I haven't tried it, but I would give it a try at least.

Yes, it does but it is not available in Julia. BlackBoxOptim is the most likely candidate I have found so far and it would fit like a glove but you know the issues there. Otherwise, some family of DE is available with the eSS (scatter search) http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1452-4 . This is in a c-library here: https://bitbucket.org/DavidPenas/sacess-library/ and in AMIGO2. More headaches with joint interoperability between MATLAB, C and Julia.

Ayush-iitkgp commented 7 years ago

I have included adding support for global optimization algorithms as part of my GSoC proposal. If selected, it will be just a matter of time before we have that functionality implemented. I invite you to review my proposal here.

ChrisRackauckas commented 7 years ago

It is available and has been in the documentation for awhile. See using NLopt. Also, IPOPT should work as well through the MPG interface on the cost function.

Ayush-iitkgp commented 7 years ago

@finmod for your reference http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html#More-Algorithms-(Global-Optimization)-via-MathProgBase-Solvers-1 Meanwhile, we are also working to support BlackBoxOptim and heuristic algorithms and would let you know once it is ready.

finmod commented 7 years ago

After all these exchanges, this is exactly the starting point of this thread. You can find all of my testing with NLopt on the Lotka-Volterra model here: https://github.com/finmod/DiffEqDocs.jl/blob/master/docs/src/analysis/Parameter%20estimation%20Lotka%20Volterra.ipynb . This notebook can be used to improve and expand the Parameter estimation page of JuliaDiffDocs. As you can see I have done some extensive testing of the 43 algorithms in NLopt although four or five of these have not been tested because of inaccurate syntax. The approach to parameter estimation has to be from the general to the particular and from the unknown to the known. It is only through the use of synthetic model that we can verify that an algorithm can "rediscover" a parameter within a narrow bound. This is basically trivial and uninteresting. Instead, we should be able to start from many unknown parameters with just broad considerations that parameters are real or positive.

With this Lotka-Volterra demo the menu of usuful and accurate estimation methods is just a few. In addition, NLopt does not deal with Differential Evolutions (DE) in the global class. An extension of the global class is provided by BlackBoxOptim but I am unable to test that for now because BlackBoxOptim needs to be upgraded to Julia 0.5 and other items.

My original question is still there: how do we know that we have the correct Parameter Estimation problem set up in build_loss_objective? I am just not convinced that what is there is the maximum likelihood and cost function of the @Ayush-iitkgp proposal. What is the content of obj.cost_function2 in build_loss_objective? It is strange to me that the four parameters cannot be estimated immedediately from a likelihood function and CRS2_LM or ESCH.

I am working on a similar notebook for the Lorenz system. There, the outcomes are discouraging so far even though they should not be because many authors are reporting good results with global algorithms. This is what motivated my comments in the first instance about how the estimation problem is set up.

ChrisRackauckas commented 7 years ago

It's setup very standard, and there really isn't much to it.

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/src/build_loss_objective.jl#L45

That's the MPB function, which calls a gradient computed by autodifferentiation and then the cost function. One thing we might want to check is that the autodifferentiation actually works correctly: it's thrown threw the whole ODE solver, which might be slightly wonky. But you can change this to calculate gradients using finite difference using that one options. Maybe check this?

The actual cost function is really simple though.

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/src/build_loss_objective.jl#L15

You create a new problem, which is the old problem with new parameters thrown in:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/5#issuecomment-292719410

then you solve it:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/5#issuecomment-292719410

and calculate the loss:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/src/build_loss_objective.jl#L36

Two details. The loss for CostVData is this function:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/src/cost_functions.jl#L7

Essentially, it just uses the loss function you supply, which defaults to L2 loss. BUT THERE IS ONE DETAIL HERE. If the ODE diverges, in order to actually be able to calculate a loss, I fill the rest of the ODE solution with NaN values. If there is a heuristic that is going wrong, this would be it. Question, how do other suites deal with this problem? Lotka-Volterra is actually a good example of a difficult problem here because for most of the parameters, it doesn't have a steady solution and will blow up to infinity. This makes the solver not error when this happens, but makes that parameter region have a NaN cost. What about infinity?

For completeness, here's how the temporary problem is created. @Ayush-iitkgp should take a look at this too.

https://github.com/JuliaDiffEq/DiffEqBase.jl/blob/master/src/parameters_interface.jl#L12

It just puts a closure on the problem's function for new parameters, and updates the eltypes (in case the eltypes should be dual numbers, which happens during autodifferentiation) and returns a new problem. For other problem types, it splits up p to put the right amount of parameters in each spot, and returns the right type of problem.

So that's all it does. Super simple: creates a new problem with the new parameters, solves it, and calls the user's loss. If I had to name two points of possible contention, it would be autodifferentiation and CostVData's handling of divergent solutions. I invite you to test a few things out here, since this function is simple enough that it should be easy to dive into for a first time contributor. Maybe the switch from willing with Inf instead of NaN is all that's needed?

finmod commented 7 years ago

@ChrisRackauckas Still struggling to make sense of the usefulness of build_loss_objective. I have restarted the problem from scratch here: https://github.com/finmod/DiffEqDocs.jl/blob/master/docs/src/analysis/ode%20lorentz%20attractor.ipynb and would like you to go through it to see how JuliaDiffEq can simplify this demo by:

The computer efficiency of the DE option in BlackBoxOptim is impressive fo this Lorenz example that could not be solved accurately in NLopt.

ChrisRackauckas commented 7 years ago

Your example using build_loss_objective:

using DifferentialEquations, RecursiveArrayTools
using BlackBoxOptim
Xiang2015Bounds = Tuple{Float64, Float64}[(9, 11), (20, 30), (2, 3)]

g1 = @ode_def_nohes LorenzExample begin
  dx = σ*(y-x)
  dy = x*(ρ-z) - y
  dz = x*y - β*z
end σ=>10.0 ρ=>28.0 β=>2.6666

r0 = [0.1;0.0;0.0]
tspan = (0.0,4.0)
prob = ODEProblem(g1,r0,tspan)
tspan2 = (0.0,3.0)
prob_short = ODEProblem(g1,r0,tspan2)

dt = 0.001
tf = 4.0
tinterval = 0:dt:tf
t  = collect(tinterval)

h = 0.01
M = 300
tstart = 0.0
tstop = tstart + M * h
tinterval_short = 0:h:tstop
t_short = collect(tinterval_short)

# Generate Data
solve(prob_short,Euler(),tstops=t_short)
data_short = vecarr_to_arr(solve(prob,Euler(),tstops=t_short))

solve(prob,Euler(),tstops=t)
data = vecarr_to_arr(solve(prob,Euler(),tstops=t))

# Build a cost function
function ess2(actual, estimated)
  sumsq = 0.0
  for i in 1:size(actual, 2)
      @inbounds sumsq += sumabs2(actual[i] .- estimated[:,i])
  end
  sumsq
end
my_cost_short(sol) = ess2(sol,data_short)
my_cost(sol) = ess2(sol,data)

# Use BlackBoxOptim
obj_short = build_loss_objective(prob_short,Euler(),my_cost_short,tstops=t_short,dense=false)
res1 = bboptimize(obj_short;SearchRange = Xiang2015Bounds, MaxSteps = 11e3)

obj = build_loss_objective(prob,Euler(),my_cost,tstops=t,dense=false)
res1 = bboptimize(obj;SearchRange = Xiang2015Bounds, MaxSteps = 8e3)

It's essentially exactly your same example, with the same results.

A few things to note. First of all CostVData seems like it broke in the RecursiveArrayTools.jl change. That needs to get fixed immediately. So I just used your cost function.

This is just using the Euler method, so it's a good estimate of the overhead. For short, the overhead is ~2x (1.65 seconds vs 3.00 seconds). For the longer problem, the overhead is smaller (16.5 seconds vs 19.4 seconds). From a quick profiling, this seems to be because the full problem/solution type is recreated each time. This is actually unnecessary with the diffeq inplace interfaces, but re-using the types in a way that works with autodifferentiation is what is difficult (but still possible). The Euler method in OrdinaryDiffEq actually has a tiny bit of extra overhead due to interpolations (one extra array allocation). This overhead still needs work.

More Important

That said, this is a toy example because it removes the true complexity of the problem. This doesn't capture the fact that the true result is not the result of the same algorithm at the same tolerance, rather the "true result" is the solution of the differential equation. So we should generate the "true result data" with really high fidelity

sol = solve(prob_short,Vern7(),saveat=t_short,abstol=1e-12,reltol=1e-12)
data_short = vecarr_to_arr(sol)

sol = solve(prob,Vern7(),saveat=t,abstol=1e-12,reltol=1e-12)
data = vecarr_to_arr(sol)

When you do this, you see that your use of the Euler method actually doesn't work. Same commands gives:

Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.RangePerDimSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 2145 evals, 2039 steps, improv/step: 0.393 (last = 0.3933), fitness=20352.424974152
1.00 secs, 4227 evals, 4121 steps, improv/step: 0.388 (last = 0.3818), fitness=20352.014664791
1.50 secs, 6342 evals, 6237 steps, improv/step: 0.387 (last = 0.3861), fitness=20352.014620040
2.01 secs, 8506 evals, 8427 steps, improv/step: 0.377 (last = 0.3470), fitness=20352.014620026
2.51 secs, 10205 evals, 10202 steps, improv/step: 0.326 (last = 0.0834), fitness=20352.014620026

Optimization stopped after 11001 steps and 2.683000087738037 seconds
Termination reason: Max number of steps (11000) reached
Steps per second = 4100.260767890857
Function evals per second = 4086.0975182608363
Improvements/step = 0.30436363636363634
Total function evaluations = 10963

Best candidate found: [11.0,24.8298,3.0]

Fitness: 20352.014620026

The reason why it worked in your example is because it was asking the wrong question. Because you created the data with the same algorithm and same steps you solved with, it was asking the question "which parameters are needed for the Euler method to produce the same output as before?" and not "which parameters are needed to get that result from the differential equation?".

This is where the fact that build_obj_function is able to use all of diffeq really matters. For example, Tsit5() without any changes does pretty well:

obj_short = build_loss_objective(prob_short,Tsit5(),my_cost_short,saveat=t_short,dense=false)
res1 = bboptimize(obj_short;SearchRange = Xiang2015Bounds, MaxSteps = 11e3)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.RangePerDimSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 1274 evals, 1164 steps, improv/step: 0.322 (last = 0.3222), fitness=1.246404783
1.00 secs, 2495 evals, 2385 steps, improv/step: 0.288 (last = 0.2563), fitness=0.069462567
1.50 secs, 4050 evals, 3941 steps, improv/step: 0.283 (last = 0.2757), fitness=0.050588358
2.00 secs, 5581 evals, 5474 steps, improv/step: 0.274 (last = 0.2492), fitness=0.050584206
2.51 secs, 7135 evals, 7029 steps, improv/step: 0.231 (last = 0.0797), fitness=0.050584187
3.01 secs, 8682 evals, 8576 steps, improv/step: 0.200 (last = 0.0569), fitness=0.050583564
3.51 secs, 10234 evals, 10128 steps, improv/step: 0.176 (last = 0.0464), fitness=0.050583564

Optimization stopped after 11001 steps and 3.7899999618530273 seconds
Termination reason: Max number of steps (11000) reached
Steps per second = 2902.6385516429746
Function evals per second = 2930.343037409951
Improvements/step = 0.16490909090909092
Total function evaluations = 11106

Best candidate found: [10.0082,27.9947,2.66844]

Fitness: 0.050583563

And we can improve this by improving the tolerance of the diffeq solver:

obj_short = build_loss_objective(prob_short,Tsit5(),my_cost_short,saveat=t_short,dense=false,abstol=1e-9,reltol=1e-9)
res1 = bboptimize(obj_short;SearchRange = Xiang2015Bounds, MaxSteps = 11e3)
Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.RangePerDimSearchSpace}}
0.00 secs, 0 evals, 0 steps
0.50 secs, 778 evals, 656 steps, improv/step: 0.372 (last = 0.3720), fitness=2.995026233
1.00 secs, 1573 evals, 1440 steps, improv/step: 0.328 (last = 0.2908), fitness=0.016312339
1.50 secs, 2386 evals, 2253 steps, improv/step: 0.305 (last = 0.2645), fitness=0.005001359
2.00 secs, 3207 evals, 3074 steps, improv/step: 0.300 (last = 0.2862), fitness=0.000062629
2.50 secs, 4021 evals, 3888 steps, improv/step: 0.293 (last = 0.2654), fitness=0.000000212
3.01 secs, 4850 evals, 4718 steps, improv/step: 0.297 (last = 0.3145), fitness=0.000000059
3.51 secs, 5659 evals, 5527 steps, improv/step: 0.292 (last = 0.2658), fitness=0.000000000
4.01 secs, 6485 evals, 6353 steps, improv/step: 0.286 (last = 0.2446), fitness=0.000000000
4.51 secs, 7284 evals, 7153 steps, improv/step: 0.287 (last = 0.2938), fitness=0.000000000
5.01 secs, 8074 evals, 7943 steps, improv/step: 0.285 (last = 0.2671), fitness=0.000000000
5.51 secs, 8884 evals, 8753 steps, improv/step: 0.282 (last = 0.2531), fitness=0.000000000
6.01 secs, 9693 evals, 9562 steps, improv/step: 0.281 (last = 0.2682), fitness=0.000000000
6.51 secs, 10524 evals, 10393 steps, improv/step: 0.280 (last = 0.2696), fitness=0.000000000

Optimization stopped after 11001 steps and 6.883999824523926 seconds
Termination reason: Max number of steps (11000) reached
Steps per second = 1598.0534980273321
Function evals per second = 1616.9378680612883
Improvements/step = 0.2812727272727273
Total function evaluations = 11131

Best candidate found: [10.0,28.0,2.6666]

Fitness: 0.000000000

And we get essentially perfect, even though we generated the data differently than we estimated the result.

Other Important Factors

Other than that, the build_obj_function is much more general. A diffeq prob can have discontinuous events. It can be a delay differential equation, or a Monte Carlo simulation of many stochastic differential equations with the loss on the mean of the trajectories. This will actually work in all of these cases, where all you need to know to do this is (A) know how to use build_obj_function in the basic case (here) and (B) know how to setup a diffeq problem for whatever eventful stochastic thing you want to solve. Chain those two together by passing in the prob and you're good to go.

Conclusions

The conclusions to draw are:

1) Your example is exactly what build_obj_function with Euler() is doing. The code gets simplified and generalized. 2) The standard cost function, CostVData, has a bug somewhere. New issue for tracking that: https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/14 3) The overhead is largest when the problem is simplest, with an upper bound ~2x, and drops fast. But we should work to bring down that upper bound. 4) BlackBoxOptim.jl works pretty well and an example for that should be added to the docs. 5) Be careful that your test problems are measuring the correct thing. This is still a good test problem for measuring the overhead vs a direct implementation (a direct implementation of Euler will always be the fastest Euler implementation, and so this gives a good numbers check), but this is not a good test problem for seeing how well algorithms converge to the "correct result", because here the "correct result" was tied to a high-error approximation of the diffeq. 6) We should probably re-analyze Optim and NLopt efficiencies using this objective function to see if the problems were due to this bug.

finmod commented 7 years ago

Great. Many clarifications and bugs cleaning. Trying to run on my notebook, I get one warning and one error so far:

using DifferentialEquations, RecursiveArrayTools using BlackBoxOptim

WARNING: using BlackBoxOptim.Parameters in module Main conflicts with an existing identifier.

I should mention that BlackBoxOptim should be on master.

Then: solve(prob_short,Euler(),tstops=t_short) data_short = vecarr_to_arr(solve(prob,Euler(),tstops=t_short))

MethodError: no method matching vecarr_to_arr(::DiffEqBase.ODESolution{Array{Array{Float64,1},1},Void,Void,Array{Float64,1},Array{Array{Array{Float64,1},1},1},DiffEqBase.ODEProblem{Array{Float64,1},Float64,true,LorenzExample,DiffEqBase.CallbackSet{Tuple{},Tuple{}}},OrdinaryDiffEq.Euler,OrdinaryDiffEq.InterpolationData{LorenzExample,Array{Array{Float64,1},1},Array{Float64,1},Array{Array{Array{Float64,1},1},1},OrdinaryDiffEq.EulerCache{Array{Float64,1},Array{Float64,1}}}}) Closest candidates are: vecarr_to_arr(::RecursiveArrayTools.AbstractVectorOfArray{T,N}) at C:\Users\Denis.julia\v0.5\RecursiveArrayTools\src\vector_of_array.jl:44

Finally, how and where to add +randn?

ChrisRackauckas commented 7 years ago

@finmod what's your Pkg.status()? Have you Pkg.update()ed lately? vecarr_to_arr is pretty new and your DiffEqBase.jl might be old.

Just add randn to the training data to simulate some randomness:

data_short = vecarr_to_arr(solve(prob,Euler(),tstops=t_short))
data_short  .+= 0.1randn(size(data_short))

In reality, the data has inherent randomness so this isn't needed. But this is good for testing to avoid some of the problems with generating the data the same way you are estimating.

finmod commented 7 years ago

There is a bug somewhere with DiffEqBase. When I do Pkg.update() il shows that DiffEqBase downgrade from v1.1.0 to v0.15.0. When I remove DifferentialEquations, it does the opposite. After re-installing, Pkg.test("DifferentialEquations") gives this:

julia> Pkg.test("DifferentialEquations") INFO: Computing test dependencies for DifferentialEquations... ERROR: unsatisfiable package requirements detected: no feasible version could be found for package: Compat (you may try increasing the value of the JULIA_PKGRESOLVE_ACCURACY environment variable) in resolve(::Dict{String,Base.Pkg.Types.VersionSet}, ::Dict{String,Dict{VersionNumber,Base.Pkg.Types.Available}}) at .\pkg\resolve.jl:37 in resolve(::Dict{String,Base.Pkg.Types.VersionSet}, ::Dict{String,Dict{VersionNumber,Base.Pkg.Types.Available}}, ::Dict{String,Tuple{VersionNumber,Bool}}, ::Dict{String,Base.Pkg.Types.Fixed}, ::Dict{String,VersionNumber}, ::Set{String}) at .\pkg\entry.jl:495 in resolve(::Dict{String,Base.Pkg.Types.VersionSet}) at .\pkg\entry.jl:476 in #test!#58(::Bool, ::Function, ::String, ::Array{AbstractString,1}, ::Array{AbstractString,1}, ::Array{AbstractString,1}) at .\pkg\entry.jl:702 in (::Base.Pkg.Entry.#kw##test!)(::Array{Any,1}, ::Base.Pkg.Entry.#test!, ::String, ::Array{AbstractString,1}, ::Array{AbstractString,1}, ::Array{AbstractString,1}) at .\:0 in #test#61(::Bool, ::Function, ::Array{AbstractString,1}) at .\pkg\entry.jl:734 in (::Base.Pkg.Entry.#kw##test)(::Array{Any,1}, ::Base.Pkg.Entry.#test, ::Array{AbstractString,1}) at .\:0 in (::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#test,Tuple{Array{AbstractString,1}}})() at .\pkg\dir.jl:31 in cd(::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#test,Tuple{Array{AbstractString,1}}}, ::String) at .\file.jl:48 in #cd#1(::Array{Any,1}, ::Function, ::Function, ::Array{AbstractString,1}, ::Vararg{Array{AbstractString,1},N}) at .\pkg\dir.jl:31 in (::Base.Pkg.Dir.#kw##cd)(::Array{Any,1}, ::Base.Pkg.Dir.#cd, ::Function, ::Array{AbstractString,1}, ::Vararg{Array{AbstractString,1},N}) at .\:0 in #test#3(::Bool, ::Function, ::String, ::Vararg{String,N}) at .\pkg\pkg.jl:258 in test(::String, ::Vararg{String,N}) at .\pkg\pkg.jl:258

julia>

Here is Pkg.status() () | A fresh approach to technical computing () | () () | Documentation: http://docs.julialang.org | |_ | Type "?help" for help. | | | | | | |/ ` | | | | || | | | (| | | Version 0.5.1 (2017-03-05 13:25 UTC) / |_'|||_'_| | Official http://julialang.org/ release |/ | x86_64-w64-mingw32

julia> Pkg.status() 25 required packages:

julia>

ChrisRackauckas commented 7 years ago

There is a bug somewhere with DiffEqBase. When I do Pkg.update() il shows that DiffEqBase downgrade from v1.1.0 to v0.15.0. When I remove DifferentialEquations, it does the opposite.

No, there's a bug in Julia Base, specifically with the Pkg resolver, which is causing havoc right now:

https://discourse.julialang.org/t/differentialequations-pkg-wont-be-added/3194

That thread gives some ways how to get around it for now, but Julia v0.5.2/v0.6 will fix this. Something should be released very soon for that, but there's really nothing I can personally do about it.

finmod commented 7 years ago

I rested the argument yesterday waiting for positive signs of Julia v0.5.2/v0.6.

This morning Pkg.add("DifferentialEquations") gives a slightly more optimistic status for DiffEqBase. It only downgrades from v1.1.0 to v0.7.0. This is not enough for vecarr_to_arr to be implemented. Just hope this gets resolved soon.

finmod commented 7 years ago

I may have spoken too soon. Pkg.test("DiffEqBase") produces all this rumble:

julia> Pkg.test("DiffEqBase") INFO: Computing test dependencies for DiffEqBase... INFO: Updating cache of DiffEqJump... INFO: Installing DelayDiffEq v0.3.0 INFO: Upgrading DiffEqBase: v0.7.0 => v0.15.0 INFO: Installing DiffEqBiological v0.0.1 INFO: Installing DiffEqCallbacks v0.0.2 INFO: Upgrading DiffEqDevTools: v0.6.0 => v0.7.1 INFO: Installing DiffEqFinancial v0.1.0 INFO: Installing DiffEqJump v0.3.1 INFO: Installing DiffEqMonteCarlo v0.2.0 INFO: Upgrading DiffEqPDEBase: v0.1.0 => v0.2.0 INFO: Upgrading DiffEqParamEstim: v0.1.0 => v0.3.0 INFO: Installing DiffEqProblemLibrary v0.5.0 INFO: Upgrading DiffEqSensitivity: v0.0.4 => v0.2.2 INFO: Upgrading DifferentialEquations: v1.1.0 => v1.10.1 INFO: Upgrading FiniteElementDiffEq: v0.2.1 => v0.3.0 INFO: Installing MultiScaleArrays v0.1.0 INFO: Upgrading OrdinaryDiffEq: v1.1.0 => v1.8.0 INFO: Upgrading ParameterizedFunctions: v1.2.0 => v1.3.0 INFO: Upgrading PyDSTool: v0.0.2 => v0.1.2 INFO: Upgrading StochasticDiffEq: v0.5.0 => v1.4.0 INFO: Upgrading Sundials: v0.3.0 => v0.10.0 INFO: Building Rmath INFO: Building Conda INFO: Building SymEngine INFO: Building WinRPM INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_42.2/repodata/repomd.xml INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_42.2/repodata/3689b77bcb17876593c6bc76a0da98448e7c44f37cbdeba0d4a9bfc15449ce8a-primary.xml.gz INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win64/openSUSE_42.2/repodata/repomd.xml INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win64/openSUSE_42.2/repodata/72b2ba63ab2642aafa3426bfcb5538a7eedf82eb7acbf8d7fa0ec8762d14e772-primary.xml.gz INFO: Building Blosc INFO: Building HDF5 INFO: Updating WinRPM package list INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_42.2/repodata/repomd.xml INFO: Downloading https://cache.julialang.org/http://download.opensuse.org/repositories/windows:/mingw:/win64/openSUSE_42.2/repodata/repomd.xml INFO: Building Sundials INFO: Building PyCall Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# numpy 1.12.1 py27_0 INFO: PyCall is using C:\Users\Denis.julia\v0.5\Conda\deps\usr\python.exe (Python 2.7.13) at C:\Users\Denis.julia\v0.5\Conda\deps\usr\python.exe, libpython = C:\Users\Denis.julia\v0.5\Conda\deps\usr\python27 INFO: C:\Users\Denis.julia\v0.5\PyCall\deps\deps.jl has not changed INFO: C:\Users\Denis.julia\v0.5\PyCall\deps\PYTHON has not changed INFO: Building PyDSTool Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# scipy 0.19.0 np112py27_0 Warning: 'conda-forge' already in 'channels' list, moving to the top Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# pydstool 0.90.2 py27_0 conda-forge INFO: Testing DiffEqBase Test Summary: | Pass Total Number of Parameters Calculation | 1 1 0.228539 seconds (141.62 k allocations: 6.113 MB) Test getindex Test Summary: | Pass Total Solution Interface | 1 1 47.138279 seconds (39.77 M allocations: 1.683 GB, 2.08% gc time) Test Summary: | Pass Total Extended Functions | 17 17 1.156987 seconds (595.41 k allocations: 26.020 MB, 0.87% gc time) Test Summary: | Pass Total Callbacks | 10 10 0.118721 seconds (80.75 k allocations: 3.477 MB) Test Summary: | Pass Total Constructed Parameterized Functions | 4 4 0.087829 seconds (51.84 k allocations: 2.245 MB) Test Summary: | Pass Total Plot Variables | 16 16 4.436058 seconds (3.02 M allocations: 121.436 MB, 1.26% gc time) INFO: DiffEqBase tests passed INFO: Downgrading DiffEqBase: v0.15.0 => v0.7.0 INFO: Downgrading DiffEqDevTools: v0.7.1 => v0.6.0 INFO: Downgrading DiffEqPDEBase: v0.2.0 => v0.1.0 INFO: Downgrading DiffEqParamEstim: v0.3.0 => v0.1.0 INFO: Downgrading DiffEqSensitivity: v0.2.2 => v0.0.4 INFO: Downgrading DifferentialEquations: v1.10.1 => v1.1.0 INFO: Downgrading FiniteElementDiffEq: v0.3.0 => v0.2.1 INFO: Downgrading OrdinaryDiffEq: v1.8.0 => v1.1.0 INFO: Downgrading ParameterizedFunctions: v1.3.0 => v1.2.0 INFO: Downgrading PyDSTool: v0.1.2 => v0.0.2 INFO: Downgrading StochasticDiffEq: v1.4.0 => v0.5.0 INFO: Downgrading Sundials: v0.10.0 => v0.3.0 INFO: Removing DelayDiffEq v0.3.0 INFO: Removing DiffEqBiological v0.0.1 INFO: Removing DiffEqCallbacks v0.0.2 INFO: Removing DiffEqFinancial v0.1.0 INFO: Removing DiffEqJump v0.3.1 INFO: Removing DiffEqMonteCarlo v0.2.0 INFO: Removing DiffEqProblemLibrary v0.5.0 INFO: Removing MultiScaleArrays v0.1.0 INFO: Building Rmath INFO: Building Sundials WARNING: Base.WORD_SIZE is deprecated. likely near C:\Users\Denis.julia\v0.5\Sundials\deps\build.jl:21 INFO: Building Conda INFO: Building SymEngine INFO: Building PyCall Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# numpy 1.12.1 py27_0 INFO: PyCall is using C:\Users\Denis.julia\v0.5\Conda\deps\usr\python.exe (Python 2.7.13) at C:\Users\Denis.julia\v0.5\Conda\deps\usr\python.exe, libpython = C:\Users\Denis.julia\v0.5\Conda\deps\usr\python27 INFO: C:\Users\Denis.julia\v0.5\PyCall\deps\deps.jl has not changed INFO: C:\Users\Denis.julia\v0.5\PyCall\deps\PYTHON has not changed INFO: Building PyDSTool Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# pip 9.0.1 py27_0 conda-forge Fetching package metadata ........... Solving package specifications: ..........

All requested packages already installed.

packages in environment at C:\Users\Denis.julia\v0.5\Conda\deps\usr:

# scipy 0.19.0 np112py27_0 Requirement already satisfied: pydstool in c:\users\denis.julia\v0.5\conda\deps\usr\lib\site-packages Requirement already satisfied: six in c:\users\denis.julia\v0.5\conda\deps\usr\lib\site-packages (from pydstool) Requirement already satisfied: scipy>=0.9 in c:\users\denis.julia\v0.5\conda\deps\usr\lib\site-packages (from pydstool) Requirement already satisfied: numpy>=1.6 in c:\users\denis.julia\v0.5\conda\deps\usr\lib\site-packages (from pydstool)

julia>

ChrisRackauckas commented 7 years ago

Yeah, you're hit with the Pkg resolver bug. Best to just wait until that's fixed I guess, unless you're willing to pin versions.

finmod commented 7 years ago

One interesting observation: by deleting the folder DiffEqBase and other troublesome folders in .julia/v0.5/ the Pkg.update() and Pkg.status() become stable. Then I can Pkg.add("DifferentialEquations") and it does not downgrade any other package during install. But then, Pkg.test("DifferentialEquations") fails with StokesDiffEq......for other reasons. This may indicate problems with DiffEq# packages that were master unregistered in the past and that have become registered. Perhaps a good clean up first of these troublesome folders would do the trick.

ChrisRackauckas commented 7 years ago

It should all be handled by Pkg, but Pkg is not handling it well. This is getting fixed up though, but it may require v0.6 since the true problem is Pkg is bugged.

ChrisRackauckas commented 7 years ago

@finmod the newest diffeq version is now released, and it seems like Pkg is having less problems with it. Try updating.

finmod commented 7 years ago

@ChrisRackauckas Thank you for flagging this. I have been working on Linux and avoiding carefully to Pkg.update() there since I had DiffEqBase already at v1.1.0.

After Pkg.update() on Windows, Pkg.test("DifferentialEquations") is ok. The problem appears to have been created by PyCall and PyDSTool and the corresponding update of Conda. They had to be Pkg.build() separately with an ENV["PYTHON"]=""; Pkg.build("PyCall"); Pkg.build("PyDSTool").

Back to Parameter estimation with BlackBoxOptim, my current version of the ipynb is here: https://github.com/finmod/DiffEqDocs.jl/blob/master/docs/src/analysis/Parameter%20estimation%20Lorenz%20bboptim.ipynb with all your work on it included.

Next steps:

ChrisRackauckas commented 7 years ago

Is the MathProgBase alignment still necessary?

Yes. Other solvers use MPB. With this, all of the registered Julia packages can use that objective function, making it easy to chain different methods. Some of the optimizers are actually calling different dispatches on the objective function type, and this hides that fact from the user to make all packages work just as easily.

the current example shows some lesser efficiency with overhead cost and less clear cut convergence. Can this be improved?

It's somewhat less efficient on Euler() vs straight Euler loop. What cost are you getting there? That can be improved some, but it will never hit the straight Euler loop because, well, that's literally the most efficient you can loop that. I measured it at ~2x, and would like to get that below 1.5x if possible, or at least identify what's causing the difference (my guess is it might be because Euler() allocates one extra array for interpolations, and the integration is trivial, and therefore dominated by that one extra allocation? Not sure). In actual usage, not hitting Euler() isn't much of a big deal because the inefficiencies are the losses due to the fact that it's running in an engine built for adaptive timestepping (push!ing into a vector, etc.). In any real example, you'd want to use adaptive timestepping, so that's moot.

But the convergence should be exactly the same. Is that not the case for you?

It would be nice to investigate various estimators with CostVData when the bug is solved. My question here is: what is the gain of using NLS or LLK for a given data misfit function? Surely, non stationary and heterosckedastic data must be approached through the properties of an estimator.

NLS is direct. The cost has less calculations, but it's equivalent to LLK with noramlly distributed constant mean and variance. Thus NLS cannot handle the assumption that variance of the error grows over time (heteroskedestic, and likely to be the case with any diffeq!). If you want to provide that kind of assumption, or use a different distribution for the errors, you have to use LLK.

Can you pls correct my presentation of the outputs (graph like Lotke-Volterra and printing out nicely the estimation results)?

I won't get to this for a bit. What exactly would you like to see?

I will look into pushing these parameter estimation examples toward bigger models (50 parameters and 10-20 equations). The inverse problem can only be investigated from big to small (dimensionality reduction, LowRankModels) and from global to local optimization. Can BBO algorithms handle these larger optimizations? Any idea?

No idea. You should also look back at NLopt with the fixed cost function. I'd assume that some of its methods would do pretty well. Or IPOPT.

ChrisRackauckas commented 7 years ago

I optimized the objective function stuff a bit. Now there's a L2Loss(t,data) that builds an optimized-L2 loss function. It doesn't generalize well the way I made it, but I'll leave that for @Ayush-iitkgp's GSoC project 👍.

@finmod take a look at the testing file I've been using based off of your problem:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/test/lorenz_test.jl

With the L2Loss version of the cost function NLopt does really well. On the long version of the problem (4000 observations), BlackBoxOptim.jl takes 18.1 seconds. However for me LN_BOBYQA takes 0.04 seconds and LN_NELDERMEAD 0.06 seconds. I think this puts this squarely in the "easy" pile now, so we should find a larger model to test on!

We should still mention BlackBoxOptim in the docs though, since a pure-Julia solution has its upsides.

I think this has shown that the objective function is at least easy and useful? At least after a few iterations of improvements it is. I think we need to start splitting this discussion into different issues which are more specific.

finmod commented 7 years ago

@ChrisRackauckas Thank you for your attention on this parameter estimation matter. For now, I have a small error precluding me watching the fireworks:

UndefVarError: L2Loss not defined

ChrisRackauckas commented 7 years ago

Should've mentioned you need to checkout master for that (I just implemented it last night)

finmod commented 7 years ago

After Pkg.checkout("DifferentialEquations") and Pkg.update(), I get:

using DifferentialEquations, RecursiveArrayTools using BlackBoxOptim, NLopt

INFO: Precompiling module DifferentialEquations. ERROR: LoadError: Declaring precompile(false) is not allowed in files that are being precompiled. in precompile(::Bool) at .\loading.jl:300 in include_from_node1(::String) at .\loading.jl:488 in macro expansion; at .\none:2 [inlined] in anonymous at .\:? in eval(::Module, ::Any) at .\boot.jl:234 in process_options(::Base.JLOptions) at .\client.jl:242 in _start() at .\client.jl:321 while loading C:\Users\Denis.julia\v0.5\ParameterizedFunctions\src\ParameterizedFunctions.jl, in expression starting on line 1 ERROR: LoadError: Failed to precompile ParameterizedFunctions to C:\Users\Denis.julia\lib\v0.5\ParameterizedFunctions.ji. in compilecache(::String) at .\loading.jl:593 in require(::Symbol) at .\loading.jl:393 in include_from_node1(::String) at .\loading.jl:488 in macro expansion; at .\none:2 [inlined] in anonymous at .\:? in eval(::Module, ::Any) at .\boot.jl:234 in process_options(::Base.JLOptions) at .\client.jl:242 in _start() at .\client.jl:321 while loading C:\Users\Denis.julia\v0.5\DifferentialEquations\src\DifferentialEquations.jl, in expression starting on line 20 Failed to precompile DifferentialEquations to C:\Users\Denis.julia\lib\v0.5\DifferentialEquations.ji.

in compilecache(::String) at .\loading.jl:593 in require(::Symbol) at .\loading.jl:422

ChrisRackauckas commented 7 years ago

No, don't checkout DifferentialEquations. Pkg.free("DifferentialEquations"). Pkg.checkout("DiffEqParamEstim").

finmod commented 7 years ago

I had tried this earlier today but the commit had not yet come thru...Now let me enjoy the fireworks!

finmod commented 7 years ago

Is the MathProgBase alignment still necessary?

Yes. Other solvers use MPB. With this, all of the registered Julia packages can use that objective function, making it easy to chain different methods. Some of the optimizers are actually calling different dispatches on the objective function type, and this hides that fact from the user to make all packages work just as easily.

Robert Feldt is expected to deliver this compatibility of BlackBoxOptim with the MathProgBase syntax. Meanwhile, one thread of development could be to deliver an example of chaining of algorithms within the class. For instance, GN_CRS2_LM + LD_LBFGS.

It would be nice to investigate various estimators with CostVData when the bug is solved. My question here is: what is the gain of using NLS or LLK for a given data misfit function? Surely, non stationary and heterosckedastic data must be approached through the properties of an estimator.

NLS is direct. The cost has less calculations, but it's equivalent to LLK with noramlly distributed constant mean and variance. Thus NLS cannot handle the assumption that variance of the error grows over time (heteroskedestic, and likely to be the case with any diffeq!). If you want to provide that kind of assumption, or use a different distribution for the errors, you have to use LLK.

Finally, you are acknowledging that LLK is the true estimator for real world problems. No one is doing it yet in Julia and it would be a major contribution to have this LLK(data misfit) in the menu of build_loss_objective. Is there a function in Julia somewhere for LLK? This can be a second separate thread.

Can you pls correct my presentation of the outputs (graph like Lotke-Volterra and printing out nicely the estimation results)?

I won't get to this for a bit. What exactly would you like to see?

Here I mean a simple “using Plots plots….” Of the actual and estimated Lorenz data. Something similar to the Lotke-Volterra plot. I have tried it “using Plots; plots!(sol);scatter(….)” but I get an pretty bad picture. Also for the presentation of the output, I had in mind something like a post analysis function similar to what is produced in AMIGO2R2016. This can be a third thread within ParameterizedFunctions (setup, pre-analysis, estimation, post-analysis)

Now that Lorenz_test.jl is running I have the following comments:

This test file is useful to test that build_loss_objective, L2Loss, Euler and other functions are producing the correct numbers. This file should be kept this way when additional functions are introduced like CostVdata, LLK or L2Loss variants.

As you rightly pointed out earlier, this test is asking the wrong question and we do want now to answer the right question “which parameters are needed to get that result from the differential equation model?”. To do so I have attempted without success to replace Euler by Vern7 in the data construction and then Euler by Tsit5 in build_loss_objective. At which point BBO fails:

obj_short = build_loss_objective(prob_short,Tsit5(),L2Loss(t_short,data_short),tstops=t_short,dense=false)

res1 = bboptimize(obj_short;SearchRange = Xiang2015Bounds, MaxSteps = 11e3)

Starting optimization with optimizer BlackBoxOptim.DiffEvoOpt{BlackBoxOptim.FitPopulation{Float64},BlackBoxOptim.RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},BlackBoxOptim.RandomBound{BlackBoxOptim.RangePerDimSearchSpace}}

0.00 secs, 0 evals, 0 steps

0.50 secs, 639 evals, 535 steps, improv/step: 0.376 (last = 0.3757), fitness=47820.096291120

1.00 secs, 1597 evals, 1493 steps, improv/step: 0.328 (last = 0.3017), fitness=47711.819153956

1.50 secs, 2704 evals, 2600 steps, improv/step: 0.305 (last = 0.2746), fitness=47710.706970827

2.01 secs, 3818 evals, 3714 steps, improv/step: 0.300 (last = 0.2890), fitness=47710.667433119

2.51 secs, 4929 evals, 4825 steps, improv/step: 0.298 (last = 0.2889), fitness=47710.666909593

3.01 secs, 6046 evals, 5943 steps, improv/step: 0.294 (last = 0.2791), fitness=47710.666901606

3.51 secs, 7167 evals, 7064 steps, improv/step: 0.297 (last = 0.3087), fitness=47710.666901534

4.01 secs, 8290 evals, 8188 steps, improv/step: 0.298 (last = 0.3105), fitness=47710.666901533

4.51 secs, 9411 evals, 9323 steps, improv/step: 0.287 (last = 0.2044), fitness=47710.666901533

5.01 secs, 10536 evals, 10469 steps, improv/step: 0.264 (last = 0.0724), fitness=47710.666901533

Optimization stopped after 11001 steps and 5.246999979019165 seconds

Termination reason: Max number of steps (11000) reached

Steps per second = 2096.6266521800985

Function evals per second = 2106.1559070304766

Improvements/step = 0.2541818181818182

Total function evaluations = 11051

Best candidate found: [10.1988,26.4885,2.0]

Fitness: 47710.666901533

The data construction solver does not need to be particularly accurate. Instead, the data predicted by the model should be accurate. What is the best combination of Vern7, Euler, Tsit5 for each task and the corresponding syntax because it looks as if it is not standard across solvers.

And the first NLopt optimization:

opt = Opt(:GN_ORIG_DIRECT_L, 3)

lower_bounds!(opt,[0.0,0.0,0.0])

upper_bounds!(opt,[11.0,30.0,3.0])

min_objective!(opt, obj_short.cost_function2)

xtol_rel!(opt,1e-12)

maxeval!(opt, 10000)

@time (minf,minx,ret) = NLopt.optimize(opt,ParamInit) # Accurate 1.27 sec

in callback catch

WARNING: dt <= dtmin. Aborting. If you would like to force continuation with dt=dtmin, set force_dtmin=true

Something went wrong. Integrator stepped past tstops but the algorithm was dtchangeable. Please report this error.

The direction of my investigation is to investigate and classify all the algorithms from the global to local optimization (i.e. real world conditions) and from looser bounds to gradually tighter bounds as a priori information on the parameters are usually scant. The causality of the research in inverse problems has to be from unknown to strongly identified parameter estimates.

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/5#issuecomment-299644757 , or mute the thread https://github.com/notifications/unsubscribe-auth/AMHyIhf6zkgbDx3_ZEboyixFKvpBnLyZks5r3Ii_gaJpZM4MDLZI . https://github.com/notifications/beacon/AMHyItr7OK0r2uFOUsMHrPVkMFftaVh1ks5r3Ii_gaJpZM4MDLZI.gif

ChrisRackauckas commented 7 years ago

The data construction solver does not need to be particularly accurate.

Yes it does, or else the data is biased. That would bias the "true" parameter estimation results as well. The uncertainty quantification shows this pretty well:

http://docs.juliadiffeq.org/latest/analysis/uncertainty_quantification.html#Example-3:-Adaptive-ProbInts-on-the-Lorenz-Attractor-1

The data should be generated with extremely small tolerances so that way we know that it's actually the correct answer. Otherwise we're not actually fitting a real solution to the Lorenz attractor. The question should be "with perfect data, what needs to be true about the diffeq solver and the optimizer in order to recover the parameters"?

I'll make another test file that sets this kind of test up.

ChrisRackauckas commented 7 years ago

Ping me on like Friday to remind me about this.

finmod commented 7 years ago

Here is my pinging on this great parameter estimation and analyses chapter.

ChrisRackauckas commented 7 years ago

Hey, thanks for reminding me. I created another test file here:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/test/lorenz_true_test.jl

Lots of comments are in there. I encourage you to explore the results a bit. The conclusions are outlined at the bottom. This generates the data with really really high fidelity, so you can basically assume it's the "true solution" (to like 1e-14). With that, a bunch of tests are run to try and recover the parameters. The main idea is that a sufficiently good numerical solution to the ODE is necessary.

1) Euler was only able to match the Euler-generated data because the data was skewed in the same exact way the optimizers were, so it canceled out. When you test it against the true data, that method doesn't actually converge. 2) Adaptive timestepping with a low enough tolerance does this well. 3) Some of the NLopt methods outperform the BBO methods. Some outperform it really well, and can optimize this in <0.05 seconds.

I show in there how to plot the solutions which generate the data, along with a bunch of different setups to see how things are working. I hope this clarifies things.

finmod commented 7 years ago

Thank you. Let me put this true test through its paces.

finmod commented 7 years ago

I am still exploring and developing the comments on selecting solvers and optimizers. Thinking about the next step of scaling up the complexity of the model, can we have the data generation code inside the Lorenz problem so as to estimate the Signal to Noise (SNR) along with the other three parameters.

g1 = @ode_def_nohes LorenzExample begin dx = σ(y-x) dy = x(ρ-z) - y dz = xy - βz end σ=>10.0 ρ=>28.0 β=>2.66667 # Parameters used to construct the dataset

r0 = [0.1; 0.0; 0.0] # Initial values of the system in space tspan = (0.0, 4.0) prob = ODEProblem(g1, r0, tspan) tspan2 = (0.0, 3.0) prob_short = ODEProblem(g1, r0, tspan2)

snr = [0.05]

data_sol_short = solve(prob_short,Vern7(),saveat=t_short,reltol=1e-12,abstol=1e-12) data_short = vecarr_to_arr(data_sol_short) data_short .+= 0.05randn(size(data_short)) # Additional noise if required

data_sol = solve(prob,Vern7(),saveat=t,reltol=1e-12,abstol=1e-12) data = vecarr_to_arr(data_sol) data .+= 0.05randn(size(data)) # Additional noise if required

The measurement function, data = model + q with q => snr

For a slightly more complex model, I would then specify a 9-D Lorenz-Rossler system with 4 parameters to be estimated and 9 ODEs as in page 6 of https://www.ncbi.nlm.nih.gov/pubmed/22181491 . But the aim is to estimate 20 or 30 parameters.

ChrisRackauckas commented 7 years ago

Welp, we can keep adding more methods 👍 . But this thread is huge now swaying into different topics, which makes it much harder to manage and much harder for new readers to chime in. I'm going to close this, but feel free to open a new thread on signal to noise ratio calculations, and a new thread about estimating some larger model.

Also, do you plan on PRing that notebook to DiffEqTutorials.jl?