NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

compute geographies of CPDB projects during build #779

Closed damonmcc closed 1 week ago

damonmcc commented 4 weeks ago

Downstream use is the Experience Builder app GIS is building and groups who have historically asked the Capital Planning team for a csv of their area's projects

outcomes

damonmcc commented 3 weeks ago

current state

At the moment, we categorize all projects as one of the following Fixed Asset, Lump Sum, or ITT, Vehicles, and Equipment. We then treat all Fixed Asset projects as if they have a specific location and try to geocode them.

This leaves us with some mapped projects, some unmapped, and many un-mappable by virtue of their exclusion from geocoding.

potential changes

Some projects do not have a location, some projects involve one or more geographic areas, and some projects involve specific locations. With this is mind, we may want to assign each project a location type like: Citywide, Area, Specific, Unknown

All Citywide and Area projects could be considered mapped. We would then only try to geocode those with a location type of Specific.

damonmcc commented 3 weeks ago

the table cpdb_adminbounds seems like what we already want and is exported as cpdb_adminbounds.csv

it has 3 columns: feature_id, admin_boundary_type, admin_boundary_id

feature_id is the Project ID so there are multiple rows per project

but there are also duplicate rows (e.g. where feature_id = '850HWCRCDB' and admin_boundary_type = 'borocode' and admin_boundary_id = '2'). maybe our script to create the table counts every intersection of a multi-geometry with each boundary, so 10 points in a borough produces 10 duplicate rows

values of interest in admin_boundary_type to filter by would be commboard and council

damonmcc commented 2 weeks ago

uploaded first attempt at generated csvs to a new EDM Sharepoint folder called Capital Project's Map

in projects_in_geographies_05022024 there are two folders with many csv files: Community Districts and City Council Districts. In each csv file there should be a row for every capital project which intersect with the geographic area and every column from CPDB for those projects.

damonmcc commented 1 week ago

uploaded latest build to EDM Sharepoint folder called Capital Project's Map