Slack: #propublica
Project Leads: @eric_bickel, @ryanes
Project Description: This ProPublica repository is part of Data for Democracy. Our purpose is to collaboratively work through analytic processes that support the journalism at ProPublica. This repository in particular contains text mining and analysis around foreign travel activity of elected officials.
Please read the Analysis Workflow below - and keep in mind, not all of the projects for ProPublica will involve each facet of the workflow. However, where applicable, we will need to keep to described process for the sake of transparency!
In order to ensure standard analysis across the board, please be sure to document the process you are taking in mining the text files. Because they are so absolutely insane, this will help to communicate downstream any of the nuances used in collection of the data.
Providing your code (either Python script, R script, or notebook md file) will help us to keep this project going smoothing!
For each analysis, data needs to be loaded and cleaned to a format that is useable for the current analysis and for future analyses.
After data has been cleaned, both the raw data and cleaned data should be uploaded to a project-specific data.world repo. Additionally, the project's readme should be updated with a summary of the cleansing process and any code associated with cleaning should be pushed to the project's GitHub repo.
Team members working in exploratory analysis work up general statistics, distributions of important variables, and hypotheses based on initial exploration of covariation.
When an analysis job is complete, a pull request to the GitHub repo should be made to be edited by collaborators of the project or a committee of assigned editors.
Team members use modeling techniques to test the hypotheses generated in the exploratory analysis phase and to quantify relationships between variables in the data. Team members may also be working to test specific hypotheses generated by ProPublica.
Algorithms used in any modeling should be vetted through open discussions with the team and through pull requests, and final model specification should be a collaborative effort using any individual findings from the discussion. The project readme should outline these specifications, and the final modeling code should be pushed to the GitHub repo.
Team members detail the findings in a reproducible report that can be immediately used by ProPublica. All sources and data used should be linked in the report, and the project readme containing all background in methodology and links to data and code.ds