NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

Comprehensive Records #575

Closed td928 closed 1 year ago

td928 commented 1 year ago

569 #570 #572 two reviewers 🐬 🐬

This PR was set out to address #572 where GIS pointed out that if a geography does not contain any records from HDB for the past ten years then it would simply not show up at all in the final aggregate table due to use of group by in sql. This is especially obvious for geographies that "shouldn't" produce any units such as parks NTAs etc. To address this, new templates with complete number of the geographies from GIS are included and then our aggregate output is joined onto it after.

In order to do this properly, new python script clean_export_aggregate.py was created. In the process of adding the geographies, it also convenient to cast the census block and tracts values to text for better joining (#570) and easy to update the final dataframe with zeros replacing the null values (#569).

read_aggregate_template and get_index_columns

To briefly cover the two helpers functions in the python script, read_aggregate_template read in the templates for each geographies from GIS which were cleaned and later to be joined onto with aggregate output. get_index_columns uses the table names to look up which column will be set as the index to be joined with.

data/agg_template

this is where all the GIS templates are stored.

td928 commented 1 year ago

sorry forgot to push one local change. Now should be all ready to be reviewed. Thanks!

mbh329 commented 1 year ago

Are the agg_templatescsv's coming directly from GIS?

td928 commented 1 year ago

this action reminded me that this is not ready because none of the python dependcies are installed for github workflow. Let me fix that.

td928 commented 1 year ago

on second thought maybe that deserves its own PR for poetry installation etc.?

td928 commented 1 year ago

seems like I did enough for it to work now. Since poetry might be a bigger piece of enhancement. Let me know if the changes I made should be good enough to get approval for this PR. Thanks! @NYCPlanning/data-engineering

mbh329 commented 1 year ago

Are the aggregate_commntydst and aggregate_councildst tables in the output supposed to have the _2010 suffix?