JohannesBuchner / UltraNest

Fit and compare complex models reliably and rapidly. Advanced nested sampling.
https://johannesbuchner.github.io/UltraNest/
Other
155 stars 30 forks source link

Feature request: HDI intervals #89

Open JorisDeRidder opened 1 year ago

JorisDeRidder commented 1 year ago

Bayesian credible intervals can currently be derived using quantiles. Would it be possible to implement an HDI interval as well?

JohannesBuchner commented 1 year ago

That's a very good idea, and it should be possible. Maybe you could try implementing it for one parameter based on the posterior samples?

General recipe:

  1. make a histogram with a reasonable number of bins
  2. iterate from the highest density bin to the lowest density bin
  3. add to a set of bins, add up the posterior probability in the bins
  4. stop if x% of the posterior have been accumulated & report the interval

The approach can in principle give you multimodal regions, which is fine or not depending on what you want. If you do not want it multimodal, you would need to add in also the bins in between in step (3).

Step 1 may also not be trivial.

JohannesBuchner commented 1 year ago

Another approach is to use a kernel-density estimation library (fastkde, getdist, etc) to do the job. I am not sure which one, perhaps arviz also has an implementation.

I'd be interested to hear how you solve this.

JorisDeRidder commented 1 year ago

Your pointer to arviz proved to be the quickest way to obtain a HDI. An example for other users. I'm assuming the following imports

import numpy as np
import pandas as pd
import arviz as az
import xarray as xr

and that you ran UltraNest:

sampler = ultranest.ReactiveNestedSampler(param_names, my_loglikelihood, my_prior_transform, derived_param_names)
results = sampler.run()

To convert the UltraNest output to an Arviz InferenceData object I used

results_df = pd.DataFrame(data=results['samples'], columns=results['paramnames'])
results_df["chain"] = 0
results_df["draw"] = np.arange(len(results_df), dtype=int)
results_df = results_df.set_index(["chain", "draw"])
xdata = xr.Dataset.from_dataframe(results_df)
trace = az.InferenceData(posterior=xdata)

which is quite likely not the shortest way. :-) A 95% HDI interval can then be obtained using Arviz built-in hdi() function:

hdi = az.hdi(trace, hdi_prob=0.95)
for name in results['paramnames']: 
    print(name, ": ", hdi[name].values)  

leading in my example to

intercept :  [0.77835208 1.14636967]
alpha :  [1.07772564 1.18964846]
sigma :  [0.27528108 0.40888707]
slope :  [1.84139496 2.47128276]
JohannesBuchner commented 8 months ago

@JorisDeRidder btw, how do you get the highest density point though?

JorisDeRidder commented 8 months ago

You mean the MAP? Some software packages (e.g. PyMC I believe) simply numerically maximize the logPosterior = logLikelihood + logPrior function. It works well if your posterior is monomodal. If that's not the case, you might want to consider constructing a KDE of your posterior and determine the MAP from that.

JohannesBuchner commented 8 months ago

Here is an approach based on getdist, which supports bounds: https://gist.github.com/JohannesBuchner/2027d0f313521387c2cded2424cdcfeb