[Turing Data Story] Election model

edaub commented 3 years ago

Summary

New story using a Bayesian Hierarchical model to examine the mail votes for several battleground states in the 2020 US Presidential Election. Implements the model in a markdown file/notebook with additional documentation/requirements specificiations. Fixes #108.

List of changes proposed in this PR (pull-request)

New files implementing the story iself
- mail_vote_model.md is the original markdown file that I used to write the story. Because the output includes several large animation files, I wanted to minimize the number of commits containing large files to reduce bloat in the repo.
- battleground-state-changes.csv is the latest version of the CSV as a fallback in case the latest one cannot be obtained from the scraper website.
- mail_vote_model.ipynb is the markdown file converted to a notebook and executed so that it contains all outputs. Note that the animations are very expensive to produce (takes several hours for each one) so I have tried to minimize the number of times the notebook needs to be executed.
- mail_vote_model.html is a static HTML version containing all outputs (including the expensive animations) for situations where the notebook cannot be displayed.
- Makefile includes recipes for converting from markdown to the (unexecuted) notebook and from the (executed) notebook to HTML.
- requirements.txt includes all requirements for running the notebook code. This is also included in the Binder YAML file. Also includes the package needed to convert from markdown to notebook.
- README.md describes the model, how to install dependencies, and convert between different formats.
- .gitignore file to ignore the model outputs that are cached to disk to avoid re-drawing MCMC samples when appropriate.
environment.yml file in the Binder directory has been updated to include additional dependencies for running the notebook.

What should a reviewer concentrate their feedback on?

[x] Does the model give enough context about the election, both for the US election format in general as well as the particular situation that arose in 2020?
[x] Do I give enough detail about the linear regression models, and is it clear why it is hard to draw conclusions from them?
[x] Is there enough background on Bayesian Inference for non-specialists?
[x] Is it clear what a hierarchical model is and why it makes sense in this situation?
[x] Is the model itself described in sufficient detail? Too much detail?
[x] Do the results get tied back to the election narrative enough to make the story of interest to readers that don't care as much about the model details?
[x] Is the coding style sensible? I try to wrap everything into functions out of habit, but sometimes this makes notebooks harder to read.
[x] Everything looks ok?

Acknowledging contributors

[ ] All contributors to this pull request are already named in the table of contributors in the README file.
[x] The following people should be added to the table of contributors in the README file: @edaub, possibly @martintoreilly who contributed to the ideas in this story.

review-notebook-app[bot] commented 3 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

crangelsmith commented 3 years ago

Hi @edaub, this is great thanks!! We will start the review process, @samvanstroud and @kevinxufs are going to be the reviewers. A couple of comments:

We might need to do some formatting changes to the notebook to make it compatible with the fast pages interface, I hope this is Ok.
As we are very close to the Christmas break we might not finish the review until the new year, but we hope to have this published sometime in January.

I'll open the review issue now.

crangelsmith commented 3 years ago

@all-contributors please add @edaub for blog

crangelsmith commented 3 years ago

@all-contributors please add @edaub for code

allcontributors[bot] commented 3 years ago

@crangelsmith

I've put up a pull request to add @edaub! :tada:

crangelsmith commented 3 years ago

@all-contributors please add @edaub for ideas

allcontributors[bot] commented 3 years ago

@crangelsmith

I've updated the pull request to add @edaub! :tada:

allcontributors[bot] commented 3 years ago

@crangelsmith

I've updated the pull request to add @edaub! :tada:

crangelsmith commented 3 years ago

@all-contributors please add @edaub for content

allcontributors[bot] commented 3 years ago

@crangelsmith

I've updated the pull request to add @edaub! :tada:

crangelsmith commented 3 years ago

@all-contributors please add @martintoreilly for ideas

allcontributors[bot] commented 3 years ago

@crangelsmith

I've put up a pull request to add @martintoreilly! :tada:

edaub commented 3 years ago

No problem, @crangelsmith! Thanks for the help with this and whenever they have a chance to review I am sure it will be helpful. I'm happy to help with formatting with some further changes to the documents if that would be useful.

samvanstroud commented 3 years ago

Hi @edaub, thanks for submitting this! @kevinxufs and I are getting started with our reviews, we hope to have them done in a week's time. The reviews will happen at https://github.com/alan-turing-institute/TuringDataStories/issues/113.

edaub commented 3 years ago

I've pushed my final updates to the markdown file, and I think I am happy with this version. In addition to the revisions suggested and discussed in #113, I ended up refactoring some of the code for modularity/clarity, including the animations.

I still need to convert to a notebook and run the expensive simulations. @crangelsmith Do we know the best way to handle the animations for the final version? I'm not sure how the deployment occurs -- will the saved notebook be published unexecuted, shown directly as I save it, or automatically run again when it is published? Similarly, if we want to publish an HTML version that will show the animations but not let the other code be executed, do we know how this might work? Happy to help out in managing this however I can -- I certainly don't want to make more effort for you!

crangelsmith commented 3 years ago

Hi @edaub , thanks for this, we will be publishing the story imminently :D

About the format for the final version, I think @samvanstroud did a check before and the executed notebook renders nicely in fast pages, so we might not have to do much on that side. He can confirm this.

Also I wanted to tell you that I plan to use this story for a hand-on session i'll be running with some masters students from Latinoamerica in a few weeks time, I think is an amazing example for them to learn about Bayesian modelling. For this I'm going to translate the notebook to Spanish and will add it to the repo.

edaub commented 3 years ago

Yes, feel free to use this in future educational opportunities.

I will run the notebook again then and commit the final executed version. I won't commit a new HTML version and will delete it from the final commit as it sounds like that isn't needed. I'll let you know with a comment here that it is ready to merge.

samvanstroud commented 3 years ago

Hi @edaub, really sorry, I have jumped the gun a bit (not staying on top of this thread) and ran the notebook on my end and have pushed it here. If there are any problems let me know and I will roll back the changes (which also rename the notebook file as required by fastpages). If things look ok to you, then I think we can go ahead and merge this!

edaub commented 3 years ago

@samvanstroud No worries -- I have a couple of local changes that I haven't pushed but I will pull your changes and rebase on top of that. I presume I should make sure the final notebook gets moved to the path you specify for publication purposes?

samvanstroud commented 3 years ago

@edaub, sounds good. Yep., the timestamped path I introduce will help keep things organised on our end, thanks!

samvanstroud commented 3 years ago

@edaub, just a friendly nudge, it will be great to get this merged.

edaub commented 3 years ago

Thanks for the reminder -- the last week has been a bit crazy but I finally found an hour to read it over one last time. I just pushed the final version that I am happy with, so feel free to go ahead and merge/publish.

I'm happy to fix the conflicts, though I suspect you know better than I do which is the correct version so I will assume someone else is fixing this unless I hear otherwise.

Also, you can squash merge this if you don't want extra copies of the big notebook in the repo -- I don't need the previous versions hanging around for any reason, so up to you how you handle the merge.

samvanstroud commented 3 years ago

Ok, I resolved the conflicts and re-ran the notebook locally just for good measure (they appeared to be missing on my end). Merging now!

alan-turing-institute / TuringDataStories