arviz-devs / arviz

Exploratory analysis of Bayesian models with Python
https://python.arviz.org
Apache License 2.0
1.62k stars 409 forks source link

Analyzing Weighted Samples in `InferenceData` #2035

Open ParadaCarleton opened 2 years ago

ParadaCarleton commented 2 years ago

Lots of sampling algorithms return weighted samples. For instance, importance sampling and SMC both return weighted samples*¸ as do elliptical slice sampling and nested sampling algorithms. Currently, though, InferenceData objects don't have an easy way to include weights in analysis. I think adding support for weighted samples would be a great feature!

*unless SMC is finished with a resampling step, but this causes a loss of information and is strictly worse than returning the weighted samples.

OriolAbril commented 2 years ago

That would be a great feature I think. I am not very familiar with weighted samples so am not sure what a good design choice would be.

I think it should be straightforward to extend histogram and ecdf plots to take them into account. As for how to store/pass weighted samples, ideally they should be somewhere in the inferencedata. Would a variable in sample_stats make sense for this? Is there any situation where weights might be variable based in additon to sample based?

ParadaCarleton commented 2 years ago

That would be a great feature I think. I am not very familiar with weighted samples so am not sure what a good design choice would be.

I think it should be straightforward to extend histogram and ecdf plots to take them into account. As for how to store/pass weighted samples, ideally they should be somewhere in the inferencedata. Would a variable in sample_stats make sense for this? Is there any situation where weights might be variable based in additon to sample based?

Hi! Sorry for the late reply, I've been busy.

I think a variable in sample_stats makes perfect sense. I can't imagine any practical situations where the weights are variable-based, although I guess in theory they could be.