In CausalityTools, I'm working with multidimensional probability mass functions (obtained from multidimensional contingency tables, which are essentially just multidimensional histograms). I can marginalize joint pmf along one or more dimensional to get 1D, 2D, 3D or whatever marginals distributions. When marginalizing out all but one dimension, I'm left with what is essentially a vector that I can wrap inProbabilities. But I also need two-dimensional and three-dimensional marginals.
It would be nice if these higher-dimensional marginal distributions also could be represented by Probabilities.
I can manage fine without at the moment, because it's only used internally, but it would be nice to document that we're using the same machinery across packages.
Implementation strategy
I think it should just be a matter of defining
struct Probabilities{T, N} <: AbstractArray{T, N}
instead of having Probabilities subtupe vector. The sum of the higher-dimensional marginals would always be 1, so nothing changes, except it can be indexed as p[i, j, k, ...] instead of just p[i].
Note that one could always flatten a multidimensional vector to a vector and wrap it with Probabilities after the fact, but then keeping track of indices gets much more messy when doing triple or quadruple loops with some elaborate indexing on the probabilities.
In CausalityTools, I'm working with multidimensional probability mass functions (obtained from multidimensional contingency tables, which are essentially just multidimensional histograms). I can marginalize joint pmf along one or more dimensional to get 1D, 2D, 3D or whatever marginals distributions. When marginalizing out all but one dimension, I'm left with what is essentially a vector that I can wrap in
Probabilities
. But I also need two-dimensional and three-dimensional marginals. It would be nice if these higher-dimensional marginal distributions also could be represented byProbabilities
.I can manage fine without at the moment, because it's only used internally, but it would be nice to document that we're using the same machinery across packages.
Implementation strategy
I think it should just be a matter of defining
instead of having
Probabilities
subtupe vector. The sum of the higher-dimensional marginals would always be 1, so nothing changes, except it can be indexed asp[i, j, k, ...]
instead of justp[i]
.Note that one could always flatten a multidimensional vector to a vector and wrap it with
Probabilities
after the fact, but then keeping track of indices gets much more messy when doing triple or quadruple loops with some elaborate indexing on the probabilities.Would this be a breaking change?