charlesfrye / psych101d-demo

Subset of demonstration materials for PSYCH101-D, "Data Science for Research Psychology"
0 stars 0 forks source link

Suggestion: avoid confusion about pm.Categorical in the hw #8

Closed ANaka closed 5 years ago

ANaka commented 5 years ago

Maybe not a big deal but you might run into some trouble with pm.Categorical() since it's a bit confusing how it is set up and IMO not rigid enough in enforcing what its args are supposed to be.

One way to solve the hw problem is this:

with adding_model:
    X = pm.Categorical("X", [0., 1.])
    Y = pm.Categorical("Y", [0., 1.])
    Z = pm.Deterministic(name = 'Z', var = X + Y)

But IIRC, this is not best practice for using pm.Categorical - you are not supposed to pass any 0s in as arguments. I forget why, I think it has something to do w/ rescaling the probability mass function, and I think what happens in practice is that if you do this, the 0 gets changed to a very small nonzero value. So if you adhere to what's in the docstring, then the only way to make a categorical dist that produces 1 every time is something like

with model:
    X_ = pm.Categorical("X_", [1.])
    X = pm.Deterministic('X', X_ + 1)

Probably this is all just splitting hairs, but highlights something else that might not be clear - the output of pm.Categorical during the sampling is an integer indexing the categories. This is something that, again, is not super straightforward to figure out from the doc strings if you just do ?pm.Categorical so might be worth stating this

charlesfrye commented 5 years ago

Interesting -- I had no idea about the automatic re-scaling.

I was hoping to be able to stick with pm.Categorical, since it's so flexible, and as much as possible only ever use pm.DiscreteUniform among the other discrete RV choices, but I think that was a bad call.

I will probably have to rewrite a chunk of the slides to talk explicitly about Bernoulli variables to make this work.

charlesfrye commented 5 years ago

The latest version of pyMC3 allows for Categoricals with exact 0 entries, so I think I'll ignore this subtlety for now.