UCB-MIDS / w241

This is the course repository for w241 and 290 -- Experiments and Causality.
15 stars 14 forks source link

+TITLE: Experiments and Causality

+OPTIONS: toc:nil

This course introduces students to experimentation in data science. The course pays particular attention to forming causal questions, and to the designing experiments that can provide answers to these questions.

| Week | Topics | Async Reading | Sync Reading | Assignment Due | |------+-----------------------------------------+-------------------------------------------------+-------------------------------------------------------------------------+----------------------| | 1 | Experimentation | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/GerberGreen.2012_1.pdf][FE 1]], [[http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html][NYT]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Feynman.1974.pdf][Feynman]], [[https://www.cbsnews.com/news/do-suburbs-make-you-fat/][Suburbs, ]][[https://www.nytimes.com/interactive/2018/07/18/upshot/nike-vaporfly-shoe-strava.html][Shoes]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Athey.2017.pdf][Predict or Cause]] | None | | 2 | Apples to Apples | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/FEDAI_ch2.pdf][FE 2]]; [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/LewisReiley.pdf][Lewis & Reiley]] (p. 1-2.5, §1; §2A-B) | Poor Economics, Ch. 1, 3, 6; [[http://www.lse.ac.uk/philosophy/science-and-pseudoscience-overview-and-transcript/][Lakatos]] (O): [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Rubin.2008.pdf ][Rubin]], sections 1 & 2 | [[./assignments/essays/essay1/README.md][Essay 1]], [[https://classroom.github.com/a/pHlIG0qi][PS 0]] | | 3 | Quantifying Uncertainty | FE 3.0, 3.1, 3.4 | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Blackwell.2013.pdf][Blackwell]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Lewis.Rao.2015.pdf][Lewis and Rao]] 1, 3.1, 3.2 | [[https://classroom.github.com/a/K_fN1Rgi][PS 1]] | | 4 | Blocking and Clustering | FE 3.6.1, 3.6.2, 4.4, 4.5 | (O): [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Cameron_Miller_Cluster_Robust_October152013.pdf][Cluster Estimator]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Moore.2012.pdf][Block]][[https://cran.r-project.org/web/packages/blockTools/index.html][Tools]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/abadie_2017.pdf][When to Cluster]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/assignments/final_project/three_project_ideas.md][Three Project Ideas]] | | 5 | Covariates and Regression | MM 1, FE 4.1-3, MM 2, [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/MHE_chapter_2.pdf][MHE p. 16-24]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Opower.pdf][Opower]] (O): FE Appendix B (p. 453), [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/morgan_rubin_2012.pdf][rerandomization]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/assignments/final_project/two_page_description.md][Two Page Description]] | | 6 | Regression; Multi-factor Experiments | MM 6.1, MM 95-97, FE 9.3.3, 9.4 | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Montgomery.2016.pdf][Montgomery]] Sections 1, 3.0, 3.1, 3.2, 3.5, 4.2, Skim 5 | PS 2 | | 7 | HTE | FE 9, [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/clark_sells_2016.pdf][Multiple Comparisons]], and [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/week_07/clark_sells_2016.R][Demo]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Goodson_Quibit.pdf][Goodson]] (O): [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/jlr-location-location-location.pdf][JLR]] 1, 2, 3.1, 4.3, [[https://codeascraft.com/2018/10/03/how-etsy-handles-peeking-in-a-b-testing/][Etsy]] | -- | | 8 | Noncompliance | FE 5 | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/GerberGreen.2005.pdf][G&G 2005]]; [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/trochim_donnelly_ch_7.pdf][TD, Ch 7]]; [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/trochim_donnelly_ch_9.pdf][TD, Ch 9]] | PS 3 | | 9 | Spillover | FE 8 and [[https://eng.lyft.com/experimentation-in-a-ridesharing-marketplace-b39db027a66e#.dqcrp06rl][lyft]] and (O) [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Cohen.2016.pdf][uber]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Miguel.2004.pdf][Miguel and Kremer]]; [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Blake.2014.pdf][Blake and Cohey 2, 3]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/assignments/final_project/project_checkin.md][Project Check-In]] | | 10 | Causality from Observation? | MM 3.1, 4.1, 5.1 | [[http://espin086.wordpress.com/2010/08/08/difference-in-difference-estimation-garbage-incinerators-and-home-prices/][Incinerators]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Glynn.2014.pdf][Glynn]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Dee.2015.pdf][Dee]] (O): [[https://medium.com/teconomics-blog/5-tricks-when-ab-testing-is-off-the-table-f2637e9f15a5][Glassberg Sands]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Lalive.2006.pdf][Lalive]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Rubin.2008.pdf][Rubin, Section 3]] | -- | | 11 | Problems, Diagnostics and the Long View | FE 11.3 | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/DinardoPischke_1997.pdf][DiNardo and Pischke]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Simonsohn.2014.pdf][Simonsohn]] (O): [[http://varianceexplained.org/r/bayesian-ab-testing/][Robinson]] | PS 4, [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/assignments/final_project/pilot_data.md][Pilot Data]] | | 12 | Attrition, Mediation, Generalizabilty | FE 7, 10, [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/bates_2017.pdf][Bates 2017]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Allcott.2014.pdf][Alcott and Rogers]] | | | 13 | Creative Experiments | FE 12, (O): [[https://www.thecut.com/2015/05/how-a-grad-student-uncovered-a-huge-fraud.html][Ny Mag]], [[http://www.sciencemag.org/news/2016/04/real-time-talking-people-about-gay-and-transgender-issues-can-change-their-prejudices][Science]], FE 13 | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/broockman_irregular.pdf][Broockman Irregularities]], [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Hughes.2017.pdf][Hughes et al.]] (O): [[https://eng.uber.com/xp/][Uber Platform]] | PS 5 | | 14 | Final Thoughts | | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/Freedman_1991.pdf][Freedman]] | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/finalProject/presentationGuidelines.pdf][Presentation]] | | 15 | -- | [[https://github.com/UC-Berkeley-I-School/mids-w241/blob/main/readings/retracted_lacour.pdf][(O): Retracted LaCour]], ([[https://www.nytimes.com/2014/12/12/health/gay-marriage-canvassing-study-science.html][tl;dr]]), [[https://www.thisamericanlife.org/radio-archives/episode/584/for-your-reconsideration][Podcat (audio))]] | | Final Paper |

This course begins with a discussion of the issues with causal inference based on observational data. We recognize that many of the decisions that we care about, whether they be business related or theoretically motivated, are /essentially/ causal in nature.

The center of the course builds out an understanding of the mechanics of estimating a causal quantity. We present two major inferential paradigms, one new and one you are likely familiar with. We first present randomization inference as a unifying, intuitive inferential paradigm. We then demonstrate how this paradigm sits in complement to the classical frequentist inferential paradigm. These concepts in hand, we turn focus to the design of experiments and place particular focus both answering the question that we set out to answer, and achieving maximally powered experiments through design.

The tail of the course pursues two parallel tracks. In the first, students form a research question that requires a causal answer and design and implement the experiment that best answers this question. At the same time, new content presented in the course focuses on the practical stumbling blocks in running an experiment and the tests to detect these stumbling blocks.

We hope that each student who completes the course will:

Computing is conducted primarily in R.

If you are looking to work on something over the break between semesters, we recommend that students spend a little time familiarizing themselves with data.table which is the data manipulation idiom that we will be using in the course.

** Compute Environment There are several options for how to build a compute environment for this course.

** Books We use two books in this course, and read a third book in the second week. We recommend that you buy a paper copy of the two textbooks (we've chosen textbooks that have a fair price), and would understand if you digitally read the third book. Support a local bookstore if you can; but, we've included a link to Amazon for those who cannot.

** Articles

| Day | Time | Instructor | |---------------------+-------------+--------------| | Monday | 5:30-6:30 | Alex | | Tuesday | 5:30-6:30 | Scott | | Tuesday | 5:30-6:30 | Micah | | Thursday | 5:30-6:30 | Micah | | Thursday | 5:30-6:30 | Scott | | (Friday before PS) | 4:00-5:00 | Alex | | (Saturday after PS) | 9:00-10:00a | Alex |