aloctavodia / BAP3

Figures and code examples from Bayesian Analysis with Python (third edition)
http://bap.com.ar/
142 stars 41 forks source link

Why no log-normal distributions in the examples? #16

Closed parrenin closed 2 months ago

parrenin commented 5 months ago

Thanks again for the good reading! I have not yet been to the end of the book, but something which surprised me is the absence of the log-normal distribution in the examples given. I thought the log-normal was a very natural distribution for always-positive variables (some people call them Jeffreys variables, if I am not mistaken). And indeed, the log-normal distribution appears around us: the distribution of temperatures (in K) in the universe, the distribution of resistance values in electronic goods, the distribution of earnings in the society, etc. Is there a reason for omitting this important distribution?

aloctavodia commented 5 months ago

I am not familiar with the term "Jeffreys variables". The log-normal is indeed a very common distribution. In the book we only discuss a small subset of distributions, this subset includes many commonly used distributions but others are excluded. PyMC and PreliZ include many more distributions. There is some discussion of prior/likelihood elicitation in the book but it's not very deep.

parrenin commented 5 months ago

Thanks for the explanation. I realized that the gamma distribution is somewhat similar but more general than the log-normal distribution. So it might be the reason why you did not use the log-normal. BTW, the gamma distribution is introduced in code 2.27 but again, not very much described.

aloctavodia commented 5 months ago

Yeah, I tried to list the most commonly used distribution in the book, but that list could be biased. In practice, I tend to use the halfnormal as a vague default prior and I tend to switch to Gamma (with mu, sigma parametrization) if I have more information to define a prior.

Something I would like to see is a survey of the most commonly used distributions in probabilistic models. Not sure how easy would be to compile such a list. Also it would be interesting to have some kind of granularity, by discipline, by probabilistic programming language, etc.

I have been thinking of integrating something like this https://distribution-explorer.github.io/discrete/bernoulli.html or even some of the information in Wikipedia (that can be very complete for many distributions) into PreliZ. PreliZ's documentation already includes some information about the distributions https://preliz.readthedocs.io/en/latest/api_reference.html. But We could do better.

I have a colleague working on a site that will collect commonly used priors for different types of models. The site is not yet public, but hopefully, it will be soon.