NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

export csvs of CPDB projects filtered by certain admin geographies #827

Closed damonmcc closed 1 week ago

damonmcc commented 2 weeks ago

resolves #779

adds a new zip file to the CPDB build called projects_in_geographies.zip

it has a csv for all projects which intersect with each Community District and each City Council District. these csv files have all CPDB columns, including geometry

successful build here

approach

notes

This new zip file increases the zipped build output from 16MBs to 75MBs 🙃.

We haven't decided how these new files will be distributed yet. Looks like the only attachment we include on Open Data is the data dictionary, but that seems like a good place to put this zip file in the future.

For now, I've been uploading the files to a new EDM Sharepoint folder called Capital Project's Map so that everyone working on the project "Capital Projects Maps" can access them.

fvankrieken commented 1 week ago

I think some of the need for this complexity points to the need to just clean up our recipe boundary data a little bit. If all of our boundaries were just

geotype geoid geom
boro 1 ...

it would be pretty trivial to have a python script which selects all geoids for specific geotype, then does some logic for all geoids, without having to "generate" or really know anything about these entities in python - really, just figures out what sql to run. Or maybe even just does it all with dbt! Mainly, I don't like moving complexity from sql into python as part of builds.

Don't mean to be critical - I've done literally the same thing (see geom aggregates for DevDB). But worry about this code (and that DevDB code) just adding complexity where ideally it's not needed

damonmcc commented 1 week ago

rebased to use the term "boundaries" rather than "geography" in all relevant places

damonmcc commented 1 week ago

rebased to use the term "geography" rather than "boundaries" in all relevant places

build after those changes here

damonmcc commented 1 week ago

rebased to fix a file name. build after that change here