NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

241 demographics pums 0812 1519 #253

Closed td928 closed 2 years ago

td928 commented 2 years ago

Overview

this work is used to incorporate the two spreadsheets (EDDT_Dem_ACS2008-2012.xlsx and EDDT_Dem_ACS2015-2019.xlsx data) from Pop for demographics indicators. This work changes a lot since now I realize the output can be for each year range and not all under the same files.

acs_pums_demographics

this is the main accessor. Trying to take advantages of the existing work. The concepts is very similar to ingest PUMS data from Pop and the process to rename columns and then reorder them are also what is taking place inside this function.

rename_columns_demo

One distinction compared to the past work for this is since there two sets of columns for each year. So a year is passed onto this function.

order_aggregate_columns

it's the same function used to reorder the columns for PUMS demoraphics indicators before

td928 commented 2 years ago

took way longer than i wanted to but finally got it to work. Lessons learned: our existing flow to do this organizationally is definitely a little messy that needs refactoring.

Age Over 5 and Total Population

this was not in the PUMS 2000 dem work but now is added to this work and should also be incorporated to add to the PUMS 2000 dem before that work is finalized. @mbh329

Pytest

the new tests actually don't work as they are constructed. I don't know if we will have enough time for me to rewrite them but basically now the test takes only a geography parameter but for any of the demographics and economics work it will need both year and geography. Not the heavies lift and maybe @SashaWeinstein is already making ways down that road?

mbh329 commented 2 years ago

@td928 What do you mean by the work wasn't included in the 2000 pums demographics and need to incorporate it? I think the age_5pl work is in the demographic ouputs, just missing the pop columns although @AmandaDoyle made a comment that the outputs look good. I will start running this branch on my local

SashaWeinstein commented 2 years ago

yea that's a smart flag for the tests. I opened an issue that include the test improvements you mention. I assigned us both. Starting my review now

mbh329 commented 2 years ago

ahh okay I think I know what you mean with the 5 plus columns. the column name is different in the the source data for the 2000 pums vs 08-12/15-19 acs data

td928 commented 2 years ago

ahh okay I think I know what you mean with the 5 plus columns. the column name is different in the the source data for the 2000 pums vs 08-12/15-19 acs data

yeah my bad for not checking first in your script whether it is missing entirely or just named differently. I had to add the age 5 or above in couple of places for mine to work. Then, I hope the median work is helpful for your question earlier if I understood correctly your question.

SashaWeinstein commented 2 years ago

Ok shoot I ran some tests and changes here break the census_2000_pums_demographics function. I think @td928 and @mbh329 you should work this out as you two worked on this code? Stack trace I got looks like this:

tests/general_indicator_tests/test_tokens.py:5: in <module>
    by_puma, by_borough, by_citywide = get_by_geo(pums_demographics=True)
tests/general_indicator_tests/general_indicator_test_helpers.py:14: in get_by_geo
    by_puma.append((a("puma"), a.__name__))
aggregate/PUMS/pums_2000_demographics.py:131: in census_2000_pums_demographics
    final = order_pums_2000_demographics(final)
aggregate/PUMS/pums_2000_demographics.py:144: in order_pums_2000_demographics
    final = order_aggregated_columns(
aggregate/aggregation_helpers.py:35: in order_aggregated_columns
    for ind_category in categories[ind]:
E   KeyError: 'total_pop'
td928 commented 2 years ago

good catch Sasha. I think Max is working on this at the moment. @mbh329 Is there a total population column coming from 2000 pums? If not happy to move the definition demographic_indicators_denom into my own script and keep it separate

mbh329 commented 2 years ago

I am not working on this atm but want to work together on it after lunch? @td928