SciML / DiffEqParamEstim.jl

Easy scientific machine learning (SciML) parameter estimation with pre-built loss functions
https://docs.sciml.ai/DiffEqParamEstim/stable/
Other
60 stars 34 forks source link

Implement maximum likelihood function #29

Closed Ayush-iitkgp closed 7 years ago

Ayush-iitkgp commented 7 years ago

The parameter estimation problem is basically to find the parameters in order to maximize the likelihood of the occurrence of the observed data. We assume that the samples are drawn from the Normally distributed population and each sample has its own variance. I would like to invite the concerned parties to express what there expectations are before I start writing the code. I plan to use Robust and efficient parameter estimation in dynamic models of biological systems paper. Feel free to mention other sources as well. Tagging few people I know @finmod, @ChrisRackauckas and @gabrielgellner others feel free to join in :)

Ayush-iitkgp commented 7 years ago

Since each observation is drawn from independent normal distributions with different variances, we will need to ask the user to pass the variances of the populations as the argument to the API. Would be a right to pass the default argument as a vector of ones?

ChrisRackauckas commented 7 years ago

I don't see a reason to constrain it to normal distribution only. Actually, as part of a research project I want to start testing other distributions. One which could be very interesting is t-distributions: this would naturally make it more robust to outliers. I think you can set something up where you use a distribution from Distributions.jl to do this.

But that is secondary though. Just like CostVData vs L2Loss, we will want a specialized fast version for the normal distribution anyways since that's the most common. So I would say start yes get that one first, but we shouldn't throw out the idea of something more general later.

ChrisRackauckas commented 7 years ago

Would be a right to pass the default argument as a vector of ones?

If all of the datapoints are in the 1e-6 range then the default would be way off. I think a good estimate would be var(data)*ones(N). I say that without any proof or anything to back it up other than it seems that's the best a priori estimate 👍

ChrisRackauckas commented 7 years ago

Also, note that the variances for the errors should always grow in any non-oscillatory example. In the general case they can grow exponentially (chaotic systems), so no math will give a good upper bound here. My hunch here is that guessing a linear growth with the slope related to the local mean/ local var of the data would be reasonable, though a constant guess is probably better to start with.

finmod commented 7 years ago

@Ayush-iitkgp Welcome to this problem and nice to see you onboard. Yes, the problem is a stated in the Robust and efficient parameter estimation in dynamic models of biological systems paper and re-stated by a different group of researchers in my exchange a couple of days ago http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005331 . This demonstrate how standard this NLP problem is.

You can find an implementation of this problem in Matlab here: https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Kernel/AMIGO_PEcost.m . This deals with both the nonlinear least squares setup (lsq) and the MLE case ('llk").

I think there is some misunderstanding on how the negative log likehood (-llk) is computed. The mean and variance in the formula is the variance/covariance of the dataset. What basically takes place in the L2Distloss is that the errors are normalized. This operation should take place after the dataset is computed but before the llk is calculated. Since the normalization is the one of the dataset it can be taken out, if convenient, of the L2Distloss mesure and factored out in front of the log.

Anyway I am here to help you sort it out whenever necessary. Understanding the Matlab code is probably the best explanation and should be very close to Julia. You will see there that three cases are envisaged: homo, homoskedastic and heteroskedastic and you read the corresponding shape of the error covariance matrix. In any case, the code checks internally if the hypothesis is tenable or not and modifies/rejects according to what is in the dataset.

Let me know if you have some difficulties in accessing some of the material. I tried to restate the problem with a simpler notation here: https://github.com/finmod/DiffEqDocs.jl/blob/master/docs/src/analysis/Parameter%20estimation%20Lorenz%20bboptim.ipynb

finmod commented 7 years ago

@Ayush-iitkgp You may find more answers to your questions in this document: AMIGO2_theory.pdf https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxhbWlnbzJ0b29sYm94fGd4OjFkYzM5Mzg5NDJjZTdiZDc

Ayush-iitkgp commented 7 years ago

@finmod I see 3 different versions of maximum-likelihood function:

  1. 'homo' homocedastic noise with constant variance
  2. 'homo_var' homocedastic noise with known non-constant variance
  3. 'hetero' heterocedastic noise with variance proportional to the observations

Do you have an example each for these 3 cases which I can use to check the correctness of my implementation?

ChrisRackauckas commented 7 years ago

'homo_var' homocedastic noise with known non-constant variance

I'm not sure of a good example for this case. It's very rare that you'd know an estimate of the noise error.

ChrisRackauckas commented 7 years ago

'homo' homocedastic noise with constant variance

The only sensible way for error to not be growing is to use a periodic function. The Lotka-Volterra example with the way the noise is generated as a constant is a very good example for this. In addition

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/31

those two are very standard examples from Computational Neuroscience that also have AMIGO2 benchmarks and fall into this category.

'hetero' heterocedastic noise with variance proportional to the observations

Pretty much every other case would do best with this likelihood. The big performance tests for this will be:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/30

For smaller tests? I'd say a good example is to still take Lotka and just generate the noise in the data yourself using some periodic function, with a periodicity that matches the periodicity of the solution.

I would stay away from chaotic examples like the Lorenz equations for now. They exhibit very different behavior and need to be handled differently. Generally these are irrelevant to biological examples, though there are some very interesting physical examples where they come up.

finmod commented 7 years ago

@finmod https://github.com/finmod I see 3 different versions of maximum-likelihood function:

  1. 'homo' homocedastic noise with constant variance

NFKB model at https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Examples/ParameterEstimation_PE/NFKB_PE.m

  1. 'homo_var' homocedastic noise with known non-constant variance

Hodgkin-Huxley at https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Examples/ParameterEstimation_PE/Hodgkin_Huxley_PE.m

  1. 'hetero' heterocedastic noise with variance proportional to the observations

Bray1993_chemotaxis at https://github.com/csynbiosys/AMIGO2R2016b/blob/master/Examples/ParameterEstimation_PE/bray1993_chemotaxis_PE.m and

Goodwin oscillator at https://sites.google.com/site/amigo2toolbox/examples/Regularization_web_ex.zip?attredirects=0

At this last location you will also find all the other examples and references.

Do you have an example each for these 3 cases which I can use to check the correctness of my implementation?

— You are receiving this because you were mentioned. Reply to this email directly, https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/29#issuecomment-308209456 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AMHyIk8XE_RQALZPHC5vyXutZYKkwKDCks5sDtcogaJpZM4N0evM mute the thread. https://github.com/notifications/beacon/AMHyIuIKYFmQEBIionUrnsDJq-03mkHwks5sDtcogaJpZM4N0evM.gif

Ayush-iitkgp commented 7 years ago

I have got a very basic doubt which means either I don't understand the problem or a simple problem is being made overly complicated. Why should I worry about these 3 different cases when it's the user who has to pass the variance as the argument to the log-likelihood function? The implementation is same irrespective of whether the variances are homo or hetero. Isn't it?

finmod commented 7 years ago

Well-spotted. It should just be a check and it is just a check when you run the AMIGO2 examples in Matlab. If you put the wrong type of homo, hetero... the program tells you that it has been modifed to the right case.

The description of what is going on is in paragraphs 1.1.2 and 1.3.1 of the Theoretical background paper. In my view the system has to be set up as equation 1.17 and then check if a, b and one or more sigma exist in the y. Since it may involve multiple identification problems, it is best to have set as an option in the call and then a check in the data.

Ayush-iitkgp commented 7 years ago

@ChrisRackauckas the default values for variance as var(data)*ones(N) might be an erroneous assumption (recall that each entry in the data is from the different population and we can calculate the variance of a population when we know many samples(definitely more than one) from that population). I think the right approach would be to make variance as a compulsory argument to the likelihood function. This is what is done in AMIGO as well. Any thought?

ChrisRackauckas commented 7 years ago

I think the right approach would be to make variance as a compulsory argument to the likelihood function. This is what is done in AMIGO as well. Any thought?

If AMIGO makes it compulsory that might be a good idea. However, one thing to check is how sensitive the optimization is to this starting value. If it isn't too sensitive, then some simple heuristic might be enough, given that the user is likely to do something similar anyways.

ChrisRackauckas commented 7 years ago

Note: AMIGO2 is GPL. Do not reference or derive anything from their code. Only use their documentation and papers as a source. Please take their code as a blackbox and don't even look at it.

Ayush-iitkgp commented 7 years ago

Can't we look at their examples to take design decision? I am not looking into their codebase (I am a noob in MATLAB programming :laughing: )

ChrisRackauckas commented 7 years ago

You can look at their API that is documented in their papers (well, it's not documented like at all). Do not look at their code. Our code is licensed as MIT, and so any "derived work" from their code is breaking the law. I would prefer to not change from MIT.

Besides, Julia needs to be programmed very differently than MATLAB so I'm not sure it's more helpful than harmful.

ChrisRackauckas commented 7 years ago

Can't we look at their examples to take design decision?

We can look at their examples to build a separate repository that is just for benchmarking the BioPreDyn problems, and that will have to be GPL licensed. However, the tools for this repository should not be derived from code at all since it's MIT licensed. Any design decision must be taken from other sources about AMIGO2 and not from the code, and that would make that design decision potentially a derived work. Just avoid it.

finmod commented 7 years ago

Just trying to be helpful without complicating matters further but what about the use and license of the benchmarking examples in Matlab Parameter EStimation TOolbox (PESTO) : https://github.com/ICB-DCM/PESTO/tree/master/examples. It has the nice erbb_signaling example for benchmarking larger models.

ChrisRackauckas commented 7 years ago

Just trying to be helpful without complicating matters further but what about the use and license of the benchmarking examples in Matlab Parameter EStimation TOolbox (PESTO) : https://github.com/ICB-DCM/PESTO/tree/master/examples. It has the nice erbb_signaling example for benchmarking larger models.

I don't see a license on those, which is even worse because copying that is subject to standard copyright infringement laws (not code-specific ones) and it is ill-specified whether that's even legal. However, since it's unlicensed we can contact the author and ask them to place an MIT license on it, which would be really nice.

For reference, the BioPreDyn benchmark code can be looked at, but the design should not be influenced by that code at all, and all Julia translations would have in a separate GPL-licensed repository. That's fine, because large benchmarks like that should not be in this repo which is for users, and the extra data etc. would just get in the way. We can just add a BioPreDyn.jl to JuliaDiffEq when we get there.

ChrisRackauckas commented 7 years ago

Benchmarking is another issue and testing with @Ayush-iitkgp we found this is probably a bad idea: the Normal likelihood is usually less numerically stable and harder to take derivatives of, so the log-likelihood (aka least squares) is pretty much always better in terms of efficiency and accuracy. So discussions about benchmarks can continue at:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/30

while direct likelihood fitting is something that would be interesting in the case with noise, and can continue the discussion at:

https://github.com/JuliaDiffEq/DiffEqParamEstim.jl/issues/39