NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Traffic fatalities and injuries #150

Closed SashaWeinstein closed 2 years ago

SashaWeinstein commented 2 years ago

This pull request has logic to get average total injuries, pedestrian injuries, motorist injuries, cyclist injuries, and total fatalities over two 5-year periods. The ingestion of the datasets will ultimately be done through data library, that is a separate pull request.

Some code from the transit access to clean the source data was re-used and put in utils/assign_PUMA.py. That pull request has some issues with it's output that I need to address. One is whether PUMAs are supposed to have leading zero's. This output does include the leading zero. I plan to wait until this is merged in and then use the code in this utils/assign_PUMA.py to clean the transit datasets.

Output makes sense, denser PUMAs have more injuries and fatalities. This data excludes highways and parkways which explains low numbers in outer boroughs where I assume a larger % of miles traveled by car are on highways and parkways

SashaWeinstein commented 2 years ago

Yea we can talk about it, it was changed to keep leading zero

AmandaDoyle commented 2 years ago

Here is my feedback so far, but open to input:

I'll finish my review after the noon meeting.

SashaWeinstein commented 2 years ago

Awesome, thanks for these comments Amanda. That makes sense on the years, didn't think about how it would work combining with the other indicators. I'll make those changes and write to one file with years in the column names.

I'll also take out the "total" in the column names

SashaWeinstein commented 2 years ago

Hey Amanda thanks for that feedback, I made those changes. Realizing now that year should go between "safety_" and "traffic_injuries", will implement this now

SashaWeinstein commented 2 years ago

Ok tried to go from bigger category to smaller. Column labels now are safety -> year range -> indicator -> indicator sub-group (if applicable). So for example pedestrian injuries 2010-2014 are safety_1014_trafficinjuries_ped. Fatalities have no sub-group so the 2010-2014 column is just safety_1014_trafficfatalities

AmandaDoyle commented 2 years ago

@SashaWeinstein awesome. I'm debating if we add per100stmi to the field name? Or leave that detail to the data dictionary? What do you think? PUMA outputs look good. During or after stand up I'm going to ask that you walk me through how you got the citywide values because I'm having a hard time backing into it.

SashaWeinstein commented 2 years ago

My instinct is to keep per100stmi in the data dictionary. Another QOL indicator I was working on today was pedestrian hospitalizations which is per 100k residents which is also wordy to put in the column headers?