Open sethaxen opened 2 years ago
@femtomc I wonder if you have input on this as well, since I think Gen traces also can be nested and contain arbitrary Julia types.
Something that may be helpful in this context: @cscherrer and me had discussed to built flatten/unflatten transformations on top of the now transport API in MeasureBase.jl. This would allow for automatically generating transforms to/from flat vectors as long as a prior measure is available (it would provide the required structural information).
Another thing that may be interesting in this contect: In BAT.jl we've recently added the ability to marginalize/flatten structures to "flat" NamedTuples using unicode. This is currently limited to non-nested input, but the result looks like this: A value (a = [1.2, 2.3], b = 4.2)
can be turned into (a⌞1⌟ = 1.2, a⌞2⌟ = 2.3, b = 4.2)
. We use a few other unicode characters too so we can preserve range-selection during marginalization and have valid unicode field names like (d⌞1ː2⌟ = ...)
. We introduced it to support value selection for plotting, but we're planning to extend it and make it more directly accessible. Maybe such a "flatten-nested-names-and-ranges-to-unicode" scheme could be useful for arviz as well?
All downstream diagnostics, statistics, plots, and serialization to NetCDF/Zarr will require access to marginals, so we need flat multidimensional arrays, often with numeric types.
@sethaxen I think of TupleVectors as making it easy to get to marginals, so maybe I don't understand what you mean by "marginals". Can you give more detail on this?
This issue continues discussion starting at https://github.com/arviz-devs/InferenceObjects.jl/issues/8#issuecomment-1223007423.
Some Julia PPLs can return draws as arbitrary Julia types. Here's an example with Soss:
Currently such types can be stored in
InferenceData
:So
InferenceData
can be used for this storage, but it's not very useful, for several reasons:Here's an example of what the Tables interface would produce:
So plotting packages that use the Tables interface, like AlgebraOfGraphics and StatsPlots, are not terribly useful here without lots of additional code.
There are several ways we might approach this:
InferenceData
, and they are expected to turn their types into whatever marginals they care about when they want to use the downstream functions we discussed above. This is the current state.Dataset
. e.g. the above example might be converted to aDataset
with variable namesa.t.x
,a.t.y
,a.t.tag
, anda.z
. If we go this route,InferenceData
would be a secondary data type used only for some analyses but not a possible default for such PPLs, since it loses some of the structure in the initial draws.InferenceData
. This would be called by the user to convert a non-flattenedInferenceData
to a flattened one, allowing provision of named dimensions. e.g. such a function would map the aboveposterior
to something like:The easiest way I can think of to provide such a default is to recur through all Julia types and allocate new arrays as done above, but there may be other options using custom Julia arrays. @oschulz, @cscherrer, would the types you have been suggesting allow for this?
Off the top of my head, a few additional criteria for the solution:
InferenceData
type and its basic functionality must be kept in a lightweight package and as generic as possible. It's not even ideal that we depend onDimensionalData
, but so we do. If we require a complicated solution with lots of dependencies, this should be its own package, which PPLs or packages with PPL-specific converters can then depend on.Since the others tagged in this have thought a lot more about this than I have, I'd appreciate any input/suggestions. cc also @ParadaCarleton