Closed samthiriot closed 7 years ago
This is just a quick answer ! If you want to encode limited knowledge about probabilities, GosplConditionalDistribution is definitely the good choice. Basically, this is a set of GosplJointDistribution which is a full knowledge matrix about probabilities across n dimension; i.e. any combination of dimension's value or coodinate (i.e. a vector that contains a unique value for each dimension) has a probability attached. On the contrary, for GosplConditionalDistribution only a limited set of cross probabilities are known - that is why it is labeled conditional - but there is no direct method to catch a "referent" dimensions. However, the set of coordinate can inform about known conditional probabilities: for ex., the more a dimension is present (there is a value for its) in the set of coordinate, the more the matrix has information about its relationship with other dimensions. Hence, we can infer a hierarchy of knowledge among dimensions through GosplConditionalDistribution Hope it helps
Still an open question. I will give it more attention soon. And come back with better ideas.
Thanks ! Yep, I wonder if we cannot add the referent dimension into the GosplConditionalDistribution. The rationale might be: 1/ if you detect it once (as you developed in one of the inference algo), then its a bit sad to loose it and recompute it later, 2/ if the user knows he's imputing data with a given reference dimension, then we should keep track of it (especially in case of Bayesian networks). Well, I'll wait a bit to see if you have new ideas on that and propose you something if I find a good idea on the how to. Tks !
Once in the dev process, i decided to put some dimensions appart from others in a so called referentDimensionSet
, but things gone very complicated rapidly from there, so i roll back to a more simple formulation
This is NOT an answer on how to compute conditional probabilities. BUT, while thinking of how to compute probability using partial information about the underlying full distribution, i update how ASegmentedNDimensionalMatrix
work around. The getValue(Collection<APopulationValue> aspects)
is based on a more suitable mechanism: say we want to know P(A,B,C,D,E)
and matrix M1
contains information P(A,B)
, M2
the probabilityP(A,C,D)
and M3
knows P(E)
.
Process => The collection of inner full matrix are sorted according to the number of concerned aspects their contains information about, here M2, M1, M3
; then, we start with probability associated with the most informative one, here it is M2
that contains P(A,C,D)
. Next we follow down the pass of information: we multiply this probability by known conditional probability, here the probability known in M1
cross the one contains in previous matrices, that is P(A,C,D) x P(B|A)
, or the probability itself when we know nothing, e.g. with M3
we should multiply the previous know probability P(A,C,D) x P(B|A)
by P(E)
. So in this exemple, the result of the method call on this segmented matrix will be P(A,C,D) x P(B|A) x P(E)
Hey ! Working in the direction of encoding Bayesian inference based on this structure of data. How should I use these matricies to encode condiitonal probabilities ? Should I use
GosplConditionalDistribution
orGosplJointDistribution
? Is there a method to know which column is the reference column, that is X in p(X|Y,Z) ? Tks