cokelaer / fitter

Fit data to many distributions
https://fitter.readthedocs.io/
GNU General Public License v3.0
368 stars 58 forks source link

How to order by AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion)? #49

Open emma-luk opened 2 years ago

emma-luk commented 2 years ago

image image

  1. The tables are ordered by sum of squares, but ordering by Akaike Information Criterion or Bayesian Information Criterion would give a different result. The KL-divergence appears to be infinite (8.6 Tanker is not infinite. Do you know why??), which is worrying as KL should be below for a good fit. How to order by Akaike Information Criterion or Bayesian Information Criterion?

  2. Most of the distributions in the section appear bi-modal or multi-modal, but the standard distributions out of SciPy appear to be single modal. So we're not seeing a good fit. Fitting a multi-modal distribution will increase the model complexity (number of model parameters) and so measures such as aic and bic will become important as a progressively better fit will be possible by increasing the dimension of a multi-modal distribution.

emma-luk commented 2 years ago

@cokelaer @Julien Hoachuck @epruesse @tirkarthi @Data-drone

Would you consider PR that implements order by Akaike Information Criterion or Bayesian Information Criterion? How to order by Akaike Information Criterion or Bayesian Information Criterion?

Thank you

Emma

lahdjirayhan commented 2 years ago

Regarding sorting by AIC or BIC,

One can do the following to sort the pandas dataframe returned by f.summary():

f.summary().sort_values('bic')

Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html