NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Affordable housing construction and Affordable housing preservation #30

Open AmandaDoyle opened 2 years ago

AmandaDoyle commented 2 years ago

Source data: https://data.cityofnewyork.us/Housing-Development/Housing-New-York-Units-by-Building/hg8x-zxpr/data Logic:

mbh329 commented 2 years ago
  1. Geographic information is excluded from certain records to "assist homeowners or protect special populations." (HPD Data Documentation, https://data.cityofnewyork.us/Housing-Development/Housing-New-York-Units-by-Building/hg8x-zxpr). The smallest geographic area they provide us with is the council district (also community district) - can we map these projects to corresponding puma?

  2. HPD data has two versions of lat/long - one using the street segment + address range, the other using the bbl centroid. if we don't end up geocoding, make sure we use the bbl centroid version

mbh329 commented 2 years ago

About 20% of records do not have address level geographic data, they do provide us with CD districts which roughly map to PUMAs. I think we could probably just map them directly to PUMA's using crosswalk tabulation but we should check with population if they think this will be accurate enough

Represent these projects at the citywide and borough level (as found in the open data) do not represent within any geographies (e.g. PUMA) not already listed in the open data.

mbh329 commented 2 years ago

There are only 62 records out of 5362 that have a missing lat/long but have some sort of street address. From a quick run through of these records, there look to be some encoding errors for certain records (house #'s encoded as dates i.e 3-11 27TH AVENUE becoming --> 11-Mar 27 AVENUE), some are also simple spelling errors that HPD geocoding wasn't able to catch. Can write this list and send back to HPD to rectify errors - Amanda doesn't think this will be too big of an issue.

Data encoding errors are fixed in open data. The other addresses that failed to geocode - DCP can geocode them again or let them be.

mbh329 commented 2 years ago

After realizing that there was already a hpd_housing_ny_units_by_building template in db-data-libraries, Amanda pointed me to a repo that already uses this dataset already https://github.com/NYCPlanning/db-developments. This data product uses the dataset from HPD already and has a geocoding process already set up by baiyue - https://github.com/NYCPlanning/db-developments/blob/main/python/geocode_hny.py Can we just take this geocoding step and apply it to the data again? Should be relatively straight forward it looks like

mbh329 commented 2 years ago

As I have been looking at the data, I wanted to flag a question about dates - should we include projects that haven't finished/filter out records that haven't a project_completion_date? Currently I don't filter any results by date, if the project has a project_start_date then they are included