arviz-devs / InferenceObjects.jl

Storage for results of Bayesian inference
https://julia.arviz.org/InferenceObjects
MIT License
14 stars 1 forks source link

Allowing alternative InferenceData/Dataset storage #16

Open sethaxen opened 2 years ago

sethaxen commented 2 years ago

Currently all Datasets wrap a Dimstack, which in turn wraps a set of NamedTuples. Likewise, InferenceData wraps a NamedTuple of Datasets. As noted in #15, this makes everything but the array values themselves immutable.

An alternative is to decouple the interface for Dataset and InferenceData from the storage. e.g. we could define NamedTupleStore, DictStore, NCDatasetStore, etc and implement the InferenceData/Dataset interfaces for each of these. This would allow users to strongly type everything with NamedTupleStore if they want to, or they could use a DictStore to have much more dynamic access. Having a NCDatasetStore would allow users to open a NetCDF file as an InferenceData and even incrementally write to such an InferenceData, having the NetCDF file automatically updated.

With a well-designed API for stores, this would actually look a lot like the InferenceData API proposed in https://github.com/arviz-devs/ArviZ.jl/issues/154. Namely, the store API could be implemented for a type like MCMCChains.Chains, and then one could call InferenceData(chains) to (inefficiently) view it as an InferenceData. When efficiency is needed, one just converts to one of the native stores.

I started work on a small prototype of this. It's a fair amount of surgery to the existing code and increases the code complexity. Though it will be more work to do it then than now, I think it's better to hold off on this until later.

sethaxen commented 1 year ago

This is closely related to https://github.com/rafaqz/DimensionalData.jl/issues/473 and probably mostly depends on something like that being implemented in DimensionalData first.