arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.62k stars 407 forks source link

InferenceData extensions [Discussion] #1066

Open OriolAbril opened 4 years ago

OriolAbril commented 4 years ago

InferenceData objects are central to ArviZ, and even though a common subset of tasks using InferenceData can be done directly with ArviZ plotting and stats functions, any task that deviates from this becomes more and more convoluted and long.

The aim of this issue is to start a discussion about new capabilities to add to InferenceData and generate a proposal (which will be added to xarray_examples for discussion with xarray team).

I also think there are several groups of functions, if it may help start brainstroming or generating different proposals per group. Ideas on all levels are welcome!

Straightforward extensions to xr.Dataset methods

.sel is a good example of this. I think several methods could fit in this category and very roughly follow a similar pattern:

def idata_extension(self, groups, ... , **kwargs):
    for group in groups:
        if group not in self._groups:
            raise Error
        # some kind of check to make method as convenient as possible
        # an example is sel using only the dimensions present in current group to index
        dataset = getattr(self, group)
        setattr(self, group, datasel.method(**kwargs)

In addition to groups we should think about other ArviZ specific args, common in most functions and not passed to xarray. Maybe inplace and/or copy?

Also, groups could accept groups and some metagroups so that one keyword represents several proper groups. We could go as far as adding the metagroups dict in rcParams. One metagroup example could be "posteriors" -> ("posterior", "sample_stats", "log_likelihood", "posterior_predictive")

Some ideas of functions that could fit in this category are:

Many dataset methods make sense to extend, so I think we should focus on the ones that solve more issues on our side. For example, if we make an extension to apply_ufunc compatible with inference data or extend the map method, the mean, median, max... are not really necessary, only convenient, whereas other methods may have no alternative.

Commenting the ones you expect to use the most seems like a good start to choose where to begin with.

Specific inference data methods

This category requires a much more detailed and custom implementation. Some examples that would fall here are:

percygautam commented 4 years ago

Regarding specific inference data method : InferenceData html repr

I have been working on this for some time. The possible implementation is https://dfm.io/nbview/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fpercygautam%2Farviz-examples%2Fmaster%2FHTML%2520repr.ipynb . I used xarray's implementation as reference.

ahartikainen commented 4 years ago

Looks great.

Do you think we should Add any extra information there? Like number of variables in each group? (not sure if this is a good idea or not)

OriolAbril commented 4 years ago

This got me thinking, should inference data objects have a name attribute?

percygautam commented 4 years ago

Looks great.

Do you think we should Add any extra information there? Like number of variables in each group? (not sure if this is a good idea or not)

Maybe, we could add dimensions for each group (when not checked). But personally, I like the minimalistic repr better.

OriolAbril commented 4 years ago

more ideas:

TimOliverMaier commented 1 year ago

Hello!

I was wondering if you have thought about a possibility to add prior out-of-sample predictions to InferenceData ? Maybe I am missing something, but this is currently only supported for the posterior, isn't it? Just like predictions a prior_predictions group could base on the predictions_constant_data but instead of the posterior use the prior group for prediction. I like to do this for model debugging in cases where prior_predictive can be misleading because the observed variable has a lot of missing values.