Open AmandaDoyle opened 2 years ago
Geographic information is excluded from certain records to "assist homeowners or protect special populations." (HPD Data Documentation, https://data.cityofnewyork.us/Housing-Development/Housing-New-York-Units-by-Building/hg8x-zxpr). The smallest geographic area they provide us with is the council district (also community district) - can we map these projects to corresponding puma?
HPD data has two versions of lat/long - one using the street segment + address range, the other using the bbl centroid. if we don't end up geocoding, make sure we use the bbl centroid version
About 20% of records do not have address level geographic data, they do provide us with CD districts which roughly map to PUMAs. I think we could probably just map them directly to PUMA's using crosswalk tabulation but we should check with population if they think this will be accurate enough
Represent these projects at the citywide and borough level (as found in the open data) do not represent within any geographies (e.g. PUMA) not already listed in the open data.
There are only 62 records out of 5362 that have a missing lat/long but have some sort of street address. From a quick run through of these records, there look to be some encoding errors for certain records (house #'s encoded as dates i.e 3-11 27TH AVENUE becoming --> 11-Mar 27 AVENUE), some are also simple spelling errors that HPD geocoding wasn't able to catch. Can write this list and send back to HPD to rectify errors - Amanda doesn't think this will be too big of an issue.
Data encoding errors are fixed in open data. The other addresses that failed to geocode - DCP can geocode them again or let them be.
After realizing that there was already a hpd_housing_ny_units_by_building template in db-data-libraries, Amanda pointed me to a repo that already uses this dataset already https://github.com/NYCPlanning/db-developments. This data product uses the dataset from HPD already and has a geocoding process already set up by baiyue - https://github.com/NYCPlanning/db-developments/blob/main/python/geocode_hny.py Can we just take this geocoding step and apply it to the data again? Should be relatively straight forward it looks like
As I have been looking at the data, I wanted to flag a question about dates - should we include projects that haven't finished/filter out records that haven't a project_completion_date
? Currently I don't filter any results by date, if the project has a project_start_date
then they are included
Source data: https://data.cityofnewyork.us/Housing-Development/Housing-New-York-Units-by-Building/hg8x-zxpr/data Logic:
New Construction
andPreservation
usingReporting Construction Type
fieldExtremely Low Income Units
Very Low Income Units
Low Income Units
Moderate Income Units
Middle Income Units