codeforboston / clean-slate-data

MIT License
27 stars 13 forks source link

Next steps for MA data #149

Closed jeremylang closed 3 years ago

jeremylang commented 4 years ago

From Dawn:

Hey y’all! So I added the a notebook to GitHub that shows the process I went to through to take out identifying and unnecessary info from the MA data. You can find the notebook at analyses/notebooks/MA_Data.ipynb and the resulting data sets at data/raw/prosecution_*.csv

The notebook includes some considerations and next steps at the end, which I will copy here as well:

    • [ ] This did not include a closer look, cleaning, or exploratory analysis of the data.
    • [ ] Date issues in the Northwestern dataset should be addressed for questions related to age. We can’t do a count of cases where the age < 21 because it will include issues from invalid values of Date of Birth, Offense Date, or both. My recommendation is to exclude the rows where ‘Age at Offense’ is < 1 from analyses related to age.
    • [ ] There is an example of the date issues in the notebook. What I should have said is that we should exclude records where ‘Age at Offense’ < 1 before getting counts where age < 21. There aren’t that many - I think it was 186 total.
    • [ ] Suffolk data does not include any indicators of age. It can still be used to help answer these questions: How many people (of any age) with marijuana possession charges? How many people (of any age) have only expungable offenses on their record but have more than 1 of them?
    • [ ] prosecution_charges.csv can be compared to the spreadsheet indicating which offenses are eligible for expungement. The Suffolk data appears to use Code Ucc Ctgry to indicate Felony or Misdemeanor. One way to proceed could be to add columns to prosecution_charges.csv like Felony-Eligible and Misdemeanor-Eligible and populate the columns with Y or N as appropriate. Once completed, these values can be mapped back to the Northwestern and Suffolk data.
    • [ ] It’d also be helpful to have a column added to prosecution_charges.csv to indicate charges related to marijuana possession.
jeremylang commented 3 years ago

Closing this issue because I believe the data team has their own path