arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.6k stars 400 forks source link

InferenceData.from_posteriordb #1557

Open feynmanliang opened 3 years ago

feynmanliang commented 3 years ago

Tell us about it

posteriordb provides a reference_draws() containing posterior samples I would like to visualize using Arviz. The data format, however, is not easily converted and at least one other person has done the dance to convert it to an InferenceData (see https://github.com/stan-dev/posteriordb/issues/225).

In detail, reference_draws() returns a list (each element = a chain) of dicts (every key is a latent variable) whose values are 1D arrays (each entry corresponding to a single draw of that latent). Furthermore, multi-dimensional latents are broken out across multiple keys using array indexing notation (e.g. theta[0], theta[1], ...).

Thoughts on implementation

@ahartikainen has a nice implementation of this conversion at https://gist.github.com/ahartikainen/ca4ec935c78c56e2d352b8d34a286fd0 which could be added as arviz.from_posteriordb.

ahartikainen commented 3 years ago

We could add a wrapper function to return InferenceData containing correct groups (data etc).

Not sure if this we need go through all the models/fits and manually check correct groups -> somekind of config db would be great.

There could a similar function as there is with arviz_data (or what was that name?)