NYCPlanning / db-factfinder

data ETL for population fact finder (decennial + acs)
https://nycplanning.github.io/db-factfinder/factfinder/
MIT License
2 stars 3 forks source link

ACS data update 0610 and 1721 #250

Closed AmandaDoyle closed 1 year ago

AmandaDoyle commented 1 year ago

Context in short: This is the product of designing data around an application. Initially, these data were created by Pop team. A few years ago DE built a process to pull data from the Census API and process it to meet the needs of the app. Pop does lots of munging of the data, so Pop would review output DE created and apply corrections; this was a lengthy process and resulted in finding miscalculations on both DE and Pop outputs. In summer of 2022, because of time and resource constraints, Pop decided to process the data in full and DE built a process to just transform the data they created into the format needed by the app. We'll be doing the same for this update.

About this update: We need to update the 2006-2010 Economic and Housing ACS data and create the 2017-2021 ACS data for PFF. See emails from SW for context.

Steps to do this update:

damonmcc commented 1 year ago

data updates have been run via dev branch in PR https://github.com/NYCPlanning/db-factfinder/pull/252! that branch was originally used for last year's 2020 manual update and is now generalized for any year we want

the 2006 - 2010 output seems identical to what was already there, and I ran it more than once to ensure I used the file Population gave us

damonmcc commented 1 year ago

thanks to Stephen's clarifying email, I used the correct sheets in the excel file and 2010 data in edm-publishing now reflects desired inflated values!

for reference, I used a python script in the template repo to programmatically compare the csvs

cc @AmandaDoyle

damonmcc commented 1 year ago

update: OSE does the inflation of data so I reverted the "production" 2010 data in edm-publishing. awaiting confirmation that 2010 and 2021 are good.

would support closing this issue since our part of the data update is complete. if feedback leads to us doing more, that can be captured in a new distinct issue

damonmcc commented 1 year ago

closing issue per standup today