Model Comparison for RV fitting

exoplanet-dev / exoplanet

Fast & scalable MCMC for all your exoplanet needs!

https://docs.exoplanet.codes

MIT License

206 stars 52 forks source link

Model Comparison for RV fitting #79

Closed shbhuk closed 4 years ago

shbhuk commented 4 years ago

Hi, I'd like to check if you have any criterion currently defined to estimate the goodness of fit of the optimizer, and then compare that to various models, i.e. with different priors and/or different parameters fixed or floating. Be it logZ, BIC or something along those lines.. Perhaps if nothing else, then even a Reduced Chi^2 to estimate the goodness..

Describe the solution you'd like If we could call a method or function after the MAP parameters are obtained, to use those values to estimate goodness of fit. Then the user could store this value or print it, and compare when they change the priors.

Thanks!

shbhuk commented 4 years ago

For example, something like this - https://radvel.readthedocs.io/en/latest/_modules/radvel/driver.html#ic_compare

dfm commented 4 years ago

I'd never recommend using information criterion like this or doing model comparison in general. There are generally always better options, but that's beyond the scope of this discussion. If you want to evaluate the BIC or similar, you can compute the value of the model's log probability using something like:

with pm.Model() as model:
    ...
    map_soln = xo.optimize()
    model_log_prob = xo.eval_in_model(model.logpt, map_soln)

or the log likelihood

with pm.Model() as model:
    ...
    loglike = pm.Normal("obs", mu=model, sigma=yerr, observed=y)
    map_soln = xo.optimize()
    model_log_prob = xo.eval_in_model(loglike, map_soln)

Then you can compute your favorite information criterion using this value and the "number of parameters" in model.ndim.

This isn't really an exoplanet thing - it's really just PyMC3 - so you could try looking for general model comparison discussions using PyMC3.

shbhuk commented 4 years ago

I see. And is there a way I could use the parameters estimated after the MCMC sampling including their uncertainties to estimate the loglikelihood or probability?

Also, I'd be glad to hear about other options than model comparison..

dfm commented 4 years ago

No. If you want that number you should use one of the snippets I gave above!

shbhuk commented 4 years ago

Got it. For reference this the model comparison feature calculates the WAIC for the trace - https://machinelearningmastery.com/probabilistic-model-selection-measures/ and works reasonably well. I can compare the MAP and sampled IC values for two different models. Thank you

shbhuk commented 4 years ago

Just an FYI - this does not work when there are multiple log-likelihood functions, as in multiple instruments or a phot + RV joint fit. In which case, one needs to follow this help link-
https://discourse.pymc.io/t/calculating-waic-for-models-with-multiple-likelihood-functions/4834/7 https://github.com/OriolAbril/calaix_de_sastre/blob/master/arviz_ic_multiple_observations/exp_vs_power_pymc_discourse/pymc3_example.ipynb