SciML / DiffEqParamEstim.jl

Easy scientific machine learning (SciML) parameter estimation with pre-built loss functions
https://docs.sciml.ai/DiffEqParamEstim/stable/
Other
60 stars 34 forks source link

Implement Two-Stage Methods #4

Closed ChrisRackauckas closed 7 years ago

ChrisRackauckas commented 7 years ago

From the video @finmod shared: http://www.birs.ca/events/2013/5-day-workshops/13w5151/videos/watch/201311140905-Campbell.html

Two-Stage methods just need to do a local linear regression on the data from the solutions. It should be a good beginner's issue. Note that LossFunctions.jl now has weighted losses which should make the implementation simpler. This is a good beginner issue which only requires knowledge of using DifferentialEquations, with hints given by looking at the DiffEqParamEstim.jl source.

shivin9 commented 7 years ago

@ChrisRackauckas Can you give some references regarding this algorithm?

ChrisRackauckas commented 7 years ago

Sorry I missed this request. These are the ones from the video.

https://arxiv.org/abs/0710.4190 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/

ChrisRackauckas commented 7 years ago

Honestly the references aren't as clear as Campbell's video.

ChrisRackauckas commented 7 years ago

There is a PR in the works: https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/pull/6

One interesting thing about this algorithm is that it just requires smoothing the data and estimating derivatives. The fact that it uses Loess is not central to the design. It seems like any other smoothed interpolation would work. It would be interesting to have other implementations which make use of that.

finmod commented 7 years ago

I suggest focusing on #5 and how the build_loss objective replicates Equations 5 and 6 in the Robust and efficient parameter estimation paper. Of the 10 examples in AMIGO2, only 2 examples are simple or simplistic LSQ estimations. All others are likelihood estimations involving a metric and a covariance. Although it is equivalent to a nonlinear least square optimization problem, it is much more sophisticated than an OLS. The main focus of parameter estimation based on measure is how sophisticated equation 5 can be: uniform, linear, quadratic, Hellinger and many others. The NLS optimization problem 6 is always the same. All of this work is DiffEq model fitting to data and this is the only game in town. Curve fitting is a waste of time and resources.

ChrisRackauckas commented 7 years ago

I suggest focusing on #5 and how the build_loss objective replicates Equations 5 and 6 in the Robust and efficient parameter estimation paper

What paper are you referencing here?

All others are likelihood estimations involving a metric and a covariance. Although it is equivalent to a nonlinear least square optimization problem, it is much more sophisticated than an OLS.

Yes, an interesting way to go would be to do some likelihood estimation and maximization. Developing a way to estimate the likelihood function is the first step to creating the Bayesian tools actually, so that is something that is close. Once we have a way to easily compute the likelihood function, we can have it be a separate option for the optimization function to maximize that instead.

The other stuff comes from using the ML tools. I only have LossFunctions.jl integrated, and that can do weighted loss functions. But regularization would have to integrate PenaltyFunctions.jl. Changing measures is something that sounds reasonable too if we go directly from the likelihood.

I don't know about the covariance stuff you're mentioning.

Curve fitting is a waste of time and resources.

Definitely not. The two-stage methods are not difficult to implement (they're almost done) and they give a cheap but rough estimate for parameters. They are also more robust than likelihood methods when your model is "wrong" or simplified.

finmod commented 7 years ago

The two functions that are required for parameter estimation are: a cost function and a parameter estimation function describing equation 5 and 6 of this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625902/ .

The answer to the regularization problem is also there as equation 7 and 8. These are almost identical to the cost function in Mads. Some of the demo models used there are the usual benchmarks: Goodwin oscillator, Fitzhugh-Nagumo and Hodgkin-Huxley with and without regularization.

ChrisRackauckas commented 7 years ago

Thanks. That's a really nice paper. So yeah, that gives the form for the likelihood function which should be created as an alternative. Adding regularizations to losses is something that can be done via PenaltyFunctions.jl, so most regularization terms don't need to be explicitly implemented. However, ones that require using the Jacobian (I didn't see an example in there, but they were referenced?) I think would require more input.

The last part is where the machine learning tools from JuliaML would come in. All of this needs cross validation etc., and we can pull those tools from JuliaML for validation of results. @Ayush-iitkgp may consider doing some of this.

finmod commented 7 years ago

A nicer version of the paper is : http://bmcsystbiol.biomedcentral.com/articles/10.1186/s12918-015-0219-2 including extensive details that you asked about in the additional files available online. Finally the Matlab code of the cost function can be seen here: https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Kernel/AMIGO_PEcost.m along with the corresponding Jac detailed in the additional file 1: https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Kernel/AMIGO_PEJac.m and all the other files.

ChrisRackauckas commented 7 years ago

From a message to @Ayush-iitkgp (slightly changed):

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/

that paper is the better paper of the two reference. The video led us astray for the beta's. The beta's are not what we're looking for as they aren't exactly estimates of X(t) and X'(t). I suggest taking a good read of at least the start through 2.1. We only need smoothed estimates of X(t) and X'(t). For example, spline regressions can be done via https://github.com/kbarbary/Dierckx.jl , and estimates can be grabbed from that. Or for the Loess, we just need derivative predictions somehow if it's not easy to just grab the derivative estimate from Loess.jl, you could just use the estimator derived in 2.1.

Think of this as what we really want is just an smoothed (i.e. doesn't exactly hit every point exactly) curve fit in order to grab X(t) and X'(t) estimates for stage 1. Exactly how the curve fit is done doesn't truly matter (though they are statistically slightly different, and so having different options would be a good idea). But if the curve fit methods are internal to the package, there should be some way to test that part is outputting sensible results.

shivin9 commented 7 years ago

For example, spline regressions can be done via https://github.com/kbarbary/Dierckx.jl , and estimates can be grabbed from that.

@ChrisRackauckas This library has an option to directly give the derivatives of the estimated spline. Will this work?

ChrisRackauckas commented 7 years ago

That would work.

shivin9 commented 7 years ago

@ChrisRackauckas Correct me if I'm wrong, in runtests.jl,

sol(t[i]) is X(t) and randomized = [(sol(t[i]) + .01randn(2)) for i in 1:length(t)] is Y(t) of the paper.

or is it that t = collect(linspace(0,10,200)) is X(t) of the paper and sol(t_i) is F(X[t_i]) of the paper.

I am confused between the two right now.

ChrisRackauckas commented 7 years ago

t = collect(linspace(0,10,200)) is the set of t's in the paper, randomized are the set of Y(t)'s from the paper. Using the Dierckx.jl route, X(t) would be the predicted values from the regression on (t,Y(t)), and X'(t) would be the derivative predictions.

Ayush-iitkgp commented 7 years ago

Hello Shivin,

sol(t[i]) is X(t) and randomized = [(sol(t[i]) + .01randn(2)) for i in 1:length(t)] is Y(t) of the paper. are the correct X(t) and Y(t).

Yours Sincerely, Ayush Pandey https://github.com/Ayush-iitkgp ayush-iitkgp.github.io/

On Wed, Mar 22, 2017 at 9:33 AM, Shivin Srivastava <notifications@github.com

wrote:

Correct me if I'm wrong, in runtests.jl https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/test/runtests.jl ,

sol(t[i]) is X(t) and randomized = [(sol(t[i]) + .01randn(2)) for i in 1:length(t)] is Y(t) of the paper.

or is it that t = collect(linspace(0,10,200)) is X(t) of the paper and sol(t_i) is F(X[t_i]) of the paper.

I am confused between the two right now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/4#issuecomment-288292647, or mute the thread https://github.com/notifications/unsubscribe-auth/AIKBSJXKpEWC2TWTSoUUlbNdxaPiyNe9ks5roJ2ggaJpZM4Lch8I .

ChrisRackauckas commented 7 years ago

sol(t[i]) is a X(t) prediction, but the two-stage method does not use a numerical solution for this part. It explicitly does not do this because sol, the numerical solution, is dependent on the parameters, which at the first stage are unknown. Because of this, X(t) and X'(t) predictions are calculated solely from the data (for this method), and then the model's parameters are tweaked by trying to make X'(t) = f(t,X(t)) where f is dependent on the parameters.

Note this is why the choice of how you regress doesn't really matter. What matters is you pull some regression curve to estimate X(t) and X'(t). One that is done, those curves are fixed since that's your data input into the problem, and your new problem is to make X'(t) = f(t,X(t)) via optimization.

Ayush-iitkgp commented 7 years ago

@shivin9 if the confusion persist, one hint is to consider data = Y(t) and that is all you know to estimate the parameters.

ChrisRackauckas commented 7 years ago

Yeah, let me go over that for a sec. I first solved the model with known parameters. This gave me sol, which I take as the "true solution" (it does have a little numerical error, ignore it).

From that, to generate a dataset, I chose t = collect(linspace(0,10,200)). I then used the perturbed solution values:

randomized = [(sol(t[i]) + .01randn(2)) for i in 1:length(t)]

This takes the true solution and bumps off the line a bit by a random number, and uses those values to generate Y(t). If you look at the plots here:

http://docs.juliadiffeq.org/latest/analysis/parameter_estimation.html#Local-Optimization-Examples-1

This shows you that they indeed basically follow the solution (it's not much randomness, mostly just to test that it will get back to the right parameters. The peak that goes up too high is actually due to the plot density of the interpolation). Thus (t,Y(t)) is a dataset with a known solution: the parameters you should get back out are a=>1.5 b=>1.0 c=>3.0 d=>1.0.

That said, the technique used in the docs, local regression, will get right to those (if you start close enough). The two-stage method is not guaranteed to do that. It should just get close. Indeed, it's a good method to get a fast approximation, and a local regression method needs a good approximation to start with to hone in on the true value. They should work well together. Keep this in mind for testing.

shivin9 commented 7 years ago

The paper uses X'(t) = F{X(t), beta}, so we input the `sol' values back into the problem? Thanks for the clarifications anyway!

ChrisRackauckas commented 7 years ago

You don't need numerical solutions to the DiffEq for the two-stage method. sol should not show up here, other than in the generation of the test dataset.

Ayush-iitkgp commented 7 years ago

@shivin9 try taking a clue from here https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/blob/master/src/DiffEqParamEstim.jl#L31

shivin9 commented 7 years ago

@Ayush-iitkgp yeah I will take a look. Thanks!

shivin9 commented 7 years ago

I have extracted the X(t) and X'(t) values. Which optimizer will be good to find out the values of the parameters?

Ayush-iitkgp commented 7 years ago

You don't have to use any optimizer. Leave it to the user. Just return the cost function as it is the case with "build_loss_objective" method.

Yours Sincerely, Ayush Pandey https://github.com/Ayush-iitkgp ayush-iitkgp.github.io/

On Wed, Mar 22, 2017 at 4:03 PM, Shivin Srivastava <notifications@github.com

wrote:

I have extracted the X(t) and X'(t) values. Which optimizer will be good to find out the values of the parameters?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/4#issuecomment-288358542, or mute the thread https://github.com/notifications/unsubscribe-auth/AIKBSG7imgu_mmFNgdSWV_vbb55MVABTks5roPkDgaJpZM4Lch8I .

shivin9 commented 7 years ago

So it won't be like lm_fit() which returns the parameters directly?

ChrisRackauckas commented 7 years ago

You could do it that way by setting it up with LsqFit.jl (see the function which already does that for details), but it doesn't really have advantages. Levenberg-Marquardt isn't a very good algorithm. If you push it to LsqFit.jl, they will be stuck with those issues. If you return the cost function, any user can take it over to Optim, NLOpt, IPOPT, etc. and just use whatever optimization package they are comfortable with or whatever has the right tools to optimize this problem well. And L-BFGS on an L2 loss function is pretty close to Levenberg-Marquardt anyways, so the default of throwing it into Optim.jl is almost the same thing.

Ayush-iitkgp commented 7 years ago

@finmod what if you don't know the initial solution of the ODE(according to the literature, it can be a costly process sometimes). How would you do the parameter estimation? That's where two-stage method comes to the rescue. Sure, it won't give the optimal value of the parameter but it is somewhere in the convex region close to the global minima and a nice starting point for least square methods. You can try using two-stage method to estimate the approximate value of the parameter and then use build_loss_objective to reach the global minima.

Following are the points, I could understand from the conversation above:

  1. redefine the build_loss_objective to take "metric" as an argument and build the cost function accordingly (It would be exciting to explore the packages that support different type functions such as log-likelihood, least squares etc)
  2. Also provide options for using different types of regularization.
ChrisRackauckas commented 7 years ago

Complete.