ANRGenstar / genstar

Generation of Synthetic Populations Library
20 stars 2 forks source link

NDimensional matrices: how to manipulate conditional probabilities ? #9

Closed samthiriot closed 7 years ago

samthiriot commented 7 years ago

Hey ! Working in the direction of encoding Bayesian inference based on this structure of data. How should I use these matricies to encode condiitonal probabilities ? Should I use GosplConditionalDistribution or GosplJointDistribution ? Is there a method to know which column is the reference column, that is X in p(X|Y,Z) ? Tks

chapuisk commented 7 years ago

This is just a quick answer ! If you want to encode limited knowledge about probabilities, GosplConditionalDistribution is definitely the good choice. Basically, this is a set of GosplJointDistribution which is a full knowledge matrix about probabilities across n dimension; i.e. any combination of dimension's value or coodinate (i.e. a vector that contains a unique value for each dimension) has a probability attached. On the contrary, for GosplConditionalDistribution only a limited set of cross probabilities are known - that is why it is labeled conditional - but there is no direct method to catch a "referent" dimensions. However, the set of coordinate can inform about known conditional probabilities: for ex., the more a dimension is present (there is a value for its) in the set of coordinate, the more the matrix has information about its relationship with other dimensions. Hence, we can infer a hierarchy of knowledge among dimensions through GosplConditionalDistribution Hope it helps

chapuisk commented 7 years ago

Still an open question. I will give it more attention soon. And come back with better ideas.

samthiriot commented 7 years ago

Thanks ! Yep, I wonder if we cannot add the referent dimension into the GosplConditionalDistribution. The rationale might be: 1/ if you detect it once (as you developed in one of the inference algo), then its a bit sad to loose it and recompute it later, 2/ if the user knows he's imputing data with a given reference dimension, then we should keep track of it (especially in case of Bayesian networks). Well, I'll wait a bit to see if you have new ideas on that and propose you something if I find a good idea on the how to. Tks !

chapuisk commented 7 years ago

Once in the dev process, i decided to put some dimensions appart from others in a so called referentDimensionSet, but things gone very complicated rapidly from there, so i roll back to a more simple formulation

chapuisk commented 7 years ago

This is NOT an answer on how to compute conditional probabilities. BUT, while thinking of how to compute probability using partial information about the underlying full distribution, i update how ASegmentedNDimensionalMatrix work around. The getValue(Collection<APopulationValue> aspects) is based on a more suitable mechanism: say we want to know P(A,B,C,D,E) and matrix M1 contains information P(A,B), M2 the probabilityP(A,C,D) and M3 knows P(E). Process => The collection of inner full matrix are sorted according to the number of concerned aspects their contains information about, here M2, M1, M3; then, we start with probability associated with the most informative one, here it is M2 that contains P(A,C,D). Next we follow down the pass of information: we multiply this probability by known conditional probability, here the probability known in M1 cross the one contains in previous matrices, that is P(A,C,D) x P(B|A), or the probability itself when we know nothing, e.g. with M3 we should multiply the previous know probability P(A,C,D) x P(B|A) by P(E). So in this exemple, the result of the method call on this segmented matrix will be P(A,C,D) x P(B|A) x P(E)