NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Export Demographics #197

Closed SashaWeinstein closed 2 years ago

SashaWeinstein commented 2 years ago

Export Demographics with ACS PUMS, Decennial Census and Census PUMS

This PR includes two different upgrades to the project. First there is Max's work to ingest Erica's census data and put it into EDDT format. Second there is code to clean up the output without changing any numbers. This includes work to reorder columns and changing column labels. It's combined into one because I integrated Max's changes while also making changes to the ACS PUMS aggregation.

Decennial Census and 2000 Census PUMS

Max wrote this code, figuring out the regex to efficiently rename the column and coding up the mapping to our naming convention. This code can be used as a jumping-off point for other functions to ingest data from population.

Change column names in ACS PUMS Aggregation

The columns in the aggregated ACS PUMS data are named in the code in the statistical/ folder. These are changes like "_count" being taken out, underscores converted to dashes, etc.

Refactor order columns

Code to reorder columns was taken out of the PUMSAggregator class and moved to a new 'aggregation_helpers.py' file. This code applied to ACS PUMS and census PUMS to ensure that they have the same column order. There are two differences between the order column processes for ACS and census PUMS. The first is that denominator columns are labeled differently. The second is that age medians only have to be reordered for census PUMS as the order that the calculate code is called for ACS PUMS orders the columns correctly by default.

Quick fixes to external review PUMS

The version of external review PUMS relies on if/else statements that are inflexible. It makes sense to merge in this half-completed state as we won't use this code in the files we produce for the Apr 1st deadline.

Future upgrades

Some docstrings point to future upgrades that will happen in the more distant future

Clean up internal review

Many files that we've already checked are deleted from internal review.

td928 commented 2 years ago

let me know if the way I changed it made sense. Thanks! @SashaWeinstein

SashaWeinstein commented 2 years ago

I think those changes do make sense! I was going to make them but was unsure. Did you run python3 -m external_review.external_review_PUMS demographics <year> <geography> to check that it works? That is what I would to double check that nothing broke. Don't think it did but always want to be careful

td928 commented 2 years ago

Hey @SashaWeinstein I think I might have messed up my branch and struggling with resolving it now. Did the export worked on your end?

mbh329 commented 2 years ago

@td928 let me know if I can help with anything

SashaWeinstein commented 2 years ago

Hey sorry I missed this @td928. can help now

SashaWeinstein commented 2 years ago

Ok neat, running tests on my machine.

@mbh329's comments that we should check that outputted dataframes have exactly the number of columns we expect is very smart. Better to think about how many columns we expect (x indicators + y crosstabs * z columns for estimate, moe, etc) and check against that number

SashaWeinstein commented 2 years ago

Both tests pass on my machine