dit / dit

Python package for information theory.
http://docs.dit.io
BSD 3-Clause "New" or "Revised" License
518 stars 87 forks source link

DAG based distributions #54

Open chebee7i opened 10 years ago

chebee7i commented 10 years ago

The current method of storing all nonzero outcomes simply won't scale. We'll need to implement a distribution based on Bayesian networks, queries via belief propagation.

Autoplectic commented 10 years ago

Can you explain this a bit more?

chebee7i commented 10 years ago

Suppose we had a distribution which factored as:

p(x,y,z,w) = p(w) p(x|w) p(y|w) p(z|w)

Then assuming k-ary alphabets, there are around k^4 possible outcomes. Explicitly storing each of these is not needed in many situations. Instead, you can compute queries on the fly from those 4 distributions (which only require that we store k + 3k^2 possible outcomes). It's also much more convenient to specify the distribution via its factors. The XOR would be something like:

FairCoin1 -> XOR      
FairCoin2 -> XOR

And then 3 distributions, one associated with each node---one of which is a deterministic function. So we get much better scaling at the expense of on-the-fly calculations.

It's also much easier to change this distribution. We can swap out the fair coin for a biased coin. We can add a new random variable just by adding an edge to the graph. Etc.

Autoplectic commented 8 years ago

Just adding a note here that we can possibly interface to https://github.com/pgmpy/pgmpy for this support