aesara-devs / aemcmc

AeMCMC is a Python library that automates the construction of samplers for Aesara graphs representing statistical models.
https://aemcmc.readthedocs.io/en/latest/
MIT License
39 stars 11 forks source link

Use rules notation to describe the beta-binomial relation #38

Closed rlouf closed 2 years ago

rlouf commented 2 years ago

The rule is currently described as follows in the docstring:

$$ \begin{align} p &\sim \operatorname{Beta}\left(\alpha, \beta\right)\ Y &\sim \operatorname{Binom}\left(N, p\right) \end{align} $$

If we observe $Y=y$, then $p$ follows a beta distribution:

$$ p \sim \operatorname{Beta}\left(\alpha + y, \beta + N - y\right) $$

Proposed change

$$ \frac{ Y \sim \operatorname{Binom}\left(N, p\right), \quad p \sim \operatorname{Beta}\left(\alpha, \beta\right) }{ \left(p|Y=y\right) \sim \operatorname{Beta}\left(\alpha+y, \beta+N-y\right) } $$

zoj613 commented 2 years ago

Is this kind of notation prevalent? If not then I think users will have a hard time parsing this proposed notation compared the current one.

rlouf commented 2 years ago

It is, cf Brandon's paper

brandonwillard commented 2 years ago

We're technically working in areas of CS where this notation is fairly standard. We haven't taken many/any steps in our documentation (or implementations) to highlight this fact, but changes like this are a start.

brandonwillard commented 2 years ago

For reference, this issue was prompted by this comment: https://github.com/aesara-devs/aemcmc/pull/31#discussion_r903222640

brandonwillard commented 2 years ago

For a little more detail as to why adopting conventions like this could help: when a rule like the beta-binomial update above is provided, a person with some type theory experience—for example—might start asking useful questions about the meaning of the rule and the system in which it's being employed.

For instance, one might start asking what all the symbols in the term language are (e.g. $\operatorname{Beta}$, $Y$, etc.), the signature(s) of the algebra(s) we support, etc. One could also ask where some of the terms "come from" and how they're tracked, like $y$ (e.g. perhaps we're missing the statement $y \sim Y$), and that might lead to us developing a formal notion of environments as used in type theory (e.g. the $\Gamma$ in judgements like $\Gamma \vdash x : X$ that carry type information about terms).

These questions all have well developed "answers" in the relevant literature, we just haven't started adopting any of them, but, if we start by formalizing whatever we reasonably can, we can hopefully start gradually identifying existing systems that work for us—or start constructing our own, if need be.

Such rules can also more immediately convey the fact that such a rule is "fundamental"/axiomatic in the system we've implemented, and not derived from other more fundamental rules. Such information, when gathered all together, is great for understanding the general properties of the system we're incrementally constructing. In other words, we're not constructing a system from the top-down, which would generally involve a complete elaboration of the rules and which (sub)set of rules are sufficient for whichever desirable outcomes one wants. Instead, we're approaching this in a mostly piecemeal fashion, and, if we ever want to reason about the system we're constructing, we need to start putting all the relevant information in one easily "parsable" format.

Regarding only notation, it's also more compact (albeit barely so), and that compactness should help when we start programmatically encoding our rewrites (e.g. we could model the rules themselves and use them to index/key their broad application per #3).