gbosquechacon / statrethink_course_in_pymc3

Statistical Rethinking course in pymc3
142 stars 44 forks source link
bayesian bayesian-inference bayesian-methods machine-learning statistics

Statistical Rethinking: A Bayesian Course Using python and pymc3

Intro

Hello everybody!

This repo contains the python/pymc3 version of the Statistical Rethinking course that Professor Richard McElreath taught on the Max Planck Institute for Evolutionary Anthropology in Leipzig during the Winter of 2019/2020. The original repo for the course, from which this repo is forked, can be found here.

The course contains 20 lectures structured in 10 weeks with a series of assignments for each week. This homework was done using the original rethinking package and ulam, a wrapper of rstan for R. The course is an excellent introduction to bayesian modelling in general and to the Rethinking Statistics wonderful book written by Professor McElreath. The course is really great, entertaining, eye-opening and very instructive.

I started to watch the lectures and do the homework but since I tend to prefer python to R I also started to re-do all the homework using pymc3, a popular python library for bayesian modelling that uses theano as backend. After I finished the course I thought I should make public the jupyter notebooks, just in case somebody finds them useful. This repo is a love-letter to the course that I have enjoyed so very much and to the work of Professor McElreath. Thank you Richard for inspiring a generation of scientists.

How to use this repo

There are ten jupyter notebooks, one for each week of the course. At the beginning of each notebook there are links to the youtube videos of the lectures, the slides used and the original homework questions and answers in R. I have put together all the material in the notebooks so you only have to follow one document at a time. Therefore each notebook basically contains four things:

  1. Original exercises proposed
  2. Original answers given by Professor McElreath. By this I mean only the text, not the code
  3. python code that provides solutions to the exercises
  4. Brief comments made by me on differences of implementation between R and python or tips/tricks of pymc3 that I learned along the way

Points 1. and 2. are written down in normal letters and contain minimum editing on my part to match it with my code. These sections were written by Professor McElreath and I kept them as they were in the original course. Points 3. and 4. are my humble contribution. The code is very easily identifiable and point 4. (my comments) are always written in italics to be perfectly identifiable and differentiable from Professor McElreath words. I kept them to a minimum but sometimes there are things to clarify, useful tips or common mistakes.

How I would use this repo is like this:

  1. Go to the notebook of the week (from 1 to 10).
  2. Watch the two videos for the lectures of that week (at the very top of each notebook).
  3. Read the original problems presented to the students and try to solve them on your own (for real! try it!).
  4. Follow the exercises solutions of the notebook with my code and explanations by Professor McElreath.

Technical considerations

I run the jupyter notebooks in a fairly humble machine running python 3.6. All the libraries needed are always at the top of the notebook as usual. There are not that many. The usual suspects such as pandas, numpy or matplotlib. For the actual modelling I used theano and pymc3 and for plotting mostly altair. I used pymc3 3.7, which is the lastest version. I did use pymc3 3.7 because of the new Data class available only from this version. I explain in detail the advantages on having the possibility of using this new class in one of the notebooks.

Other useful resources

There are a lot of very useful resources for bayesian statistical modelling out there. Specifically centered on Professor McElreath work I would mention:

  1. Original repo for the course.
  2. Original rethinking package repo.
  3. The pymc3 repo contains a resources section where you can find the exercises for the first edition of the Rethinking Statistics book (the book, not the course) done in pymc3. It's a bit outdated but still a very good resource.
  4. A. Solomon Kurz re-wrote the whole book exercises using a great R package called brms. You can find this extensive and amazing work here and here.

Notebooks

Finally, since github sometimes has issues rendering Jupyter notebooks, you can find them via nbviewer in the following links. In the repo, you can find them in the /notebooks/pymc3 folder.

Week 1 notebook: The Golem of Prague and Garden of Forking Data

Week 2 notebook: Geocentric Models and Wiggly Orbits

Week 3 notebook: Spurious Waffles and Haunted DAG

Week 4 notebook: Ulysses' Compass and Model Comparison

Week 5 notebook: Conditional Manatees and Markov Chain Monte Carlo

Week 6 notebook: Maximum entropy & GLMs and God Spiked the Integers (binomial & Poisson GLMs)

Week 7 notebook: Monsters & Mixtures (Poisson GLMs, survival, zero-inflation) and Ordered Categories, Left & Right

Week 8 notebook: Multilevel Models and Multilevel Models 2

Week 9 notebook: Adventures in Covariance and Slopes, Instruments and Social Relations

Week 10 notebook: Gaussian Processes and Missing Values and Measurement Error