Negative AIC? - Githubissues

viochemist commented 4 years ago

Hello all!

I'm finally using ChemEx again after a long hiatus, and boy has a lot changed! I'm working through new quirks but one thing I wanted to ask about first was why I'm getting negative AIC values in the statistics? I believe AIC is roughly equivalent to red.Chi2 + 2*log(parameters or DoF) or something of that nature. It should never be negative. I can provide examples if necessary.

I'm using the miniconda version chemex-2018.10.2-py_0/ . Let me know if this is not appropriate.

reneeotten commented 3 years ago

hi Alex,

the AIC is calculated as described in the documentation of lmfit: 𝑁ln(𝜒2/𝑁)+2𝑁varys; so if x2/N is smaller than 1 you'll get a negative AIC value; that seems all correct to me.

viochemist commented 3 years ago

Ok ... my experience with AIC began with the work by d'Auvergne and Model Free. There, he derived AIC and found it based on chi^2, not reduced chi^2 (ie. x2/N).

[image: image.png] In this form, it would never be negative. I don't know theoretical mathematics well enough to understand where the difference arises. My intuition is that if AIC is based on reduced chi2, that'll lead to over fitting? Thoughts?

https://link.springer.com/article/10.1023/A:1021902006114

Alex Hansen

On Fri, Apr 9, 2021 at 11:23 PM Renee Otten @.***> wrote:

hi Alex,

the AIC is calculated as described in the documentation of lmfit https://lmfit.github.io/lmfit-py/fitting.html#akaike-and-bayesian-information-criteria: 𝑁ln(𝜒2/𝑁)+2𝑁varys; so if x2/N is smaller than 1 you'll get a negative AIC value; that seems all correct to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gbouvignies/ChemEx/issues/45#issuecomment-817070111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5H4JJSJHRWZAFIPASMPQTTH7ADJANCNFSM4SSLMH3A .

viochemist commented 3 years ago

FWIW ... there's a newer criterion combining AIC/BIC

https://ieeexplore.ieee.org/document/7953690

Alex Hansen

On Sat, Apr 10, 2021 at 3:01 PM Alexandar Hansen @.***> wrote:

Ok ... my experience with AIC began with the work by d'Auvergne and Model Free. There, he derived AIC and found it based on chi^2, not reduced chi^2 (ie. x2/N).

[image: image.png] In this form, it would never be negative. I don't know theoretical mathematics well enough to understand where the difference arises. My intuition is that if AIC is based on reduced chi2, that'll lead to over fitting? Thoughts?

https://link.springer.com/article/10.1023/A:1021902006114

Alex Hansen

On Fri, Apr 9, 2021 at 11:23 PM Renee Otten @.***> wrote:

hi Alex,

the AIC is calculated as described in the documentation of lmfit https://lmfit.github.io/lmfit-py/fitting.html#akaike-and-bayesian-information-criteria: 𝑁ln(𝜒2/𝑁)+2𝑁varys; so if x2/N is smaller than 1 you'll get a negative AIC value; that seems all correct to me.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gbouvignies/ChemEx/issues/45#issuecomment-817070111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA5H4JJSJHRWZAFIPASMPQTTH7ADJANCNFSM4SSLMH3A .

viochemist commented 3 years ago

Ok, I've spent the last day and a half going over this and I think I finally understand all the differences. In short, the lmfit uses an estimator of the (unknown) variance in its definition of AIC (BIC, etc) that is assumed to be uniform, normal for all data. This is then a little confused by the fact that it says:

"where r is the residual array returned by the objective function (likely to be (data-model)/uncertainty"

Indicating that the variance probably IS known. I still have to do some derivations under that scenario, but if the residual where simply "data-model" then everything is perfectly correct. Again, for unknown, uniform variances in the data. (I can send my mathematica notebook if anyone wants it :-) )

Given that our data has known, independent errors (maybe uniform for a profile, but definitely different among different datasets), then the estimator of variance is unnecessary. So AIC, by definition being:

AIC = 2k - 2Log[L[theta]]

Where L[theta] is the likelihood function, and Log[L[theta]] is:

1/2 Sum_i_to_n ( -Log[2 pi] - Log[sigma_i] - (r_i^2/sigma_i^2) )

The last term becomes the real Chi2. Then AIC (with all the constants) is:

AIC = 2*k + Chi2 + n Log[2 pi] + Sum_i_to_n ( Log[ sigma_i^2] ).

As the last two terms are constant, for identical datasets, Only 2*k + Chi2 remains.

TL;DR

AIC should be 2k + Chi2, not 2k + n Log[Chi2/n], as the latter assumes unknown, uniform variance of the data. By extension, BIC would k Log[n] + Chi2

gbouvignies commented 3 years ago

Thanks, Alex, for having looked into that. I've never really used AIC and BIC, so didn't really check them. What you say makes much sense to me, so I'm inclined to follow you (and Edward) on this aspect and change the code accordingly.

gbouvignies commented 4 months ago

Long overdue!

gbouvignies / ChemEx

Negative AIC? #45