NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

Export demographics and economics #258

Closed SashaWeinstein closed 2 years ago

SashaWeinstein commented 2 years ago

Export the last two EDDT categories 🌻

We are very close! This PR has new code to export using the 5 accessors we've written together. It also has some changes to the code in those accessors. If you're curious here is action that produced the output on DO

New collate_save_census function

This replaces the old external_review_PUMS.save_PUMS function. There is a special function for the two census categories because we write separate files for each year-geography. The housing prod/QOL/housing security categories write one file to each geography.

This new function uses the existing accessor function but doesn't load them from all_accessors.py. This is not great design as I wanted one process to load accessors. However given the time constraints I didn't see how to make that happen. So currently we have two classes, Accessors in all_accessors.py which returns a flat list of indicators used for testing and exporting housing prod/QOL/housing security and then CensusAccessors which needs a geography and a year combination to get back an accessor. Maybe the two contexts in which the accessors are called are too different and we won't arrive at one unified process, I'm not sure. We could also use CensusAccessors for testing, I know it bother Te that we don't test each year for the census categories.

Changes to existing accessors

Year parameter

All accessors for demographics and economics now take a year parameter. This is so that they can all be called via df = accessor(geography, year). For the two census pums accessors this is redundant as they can only take a year parameter of "2000" but it was the best way to have consistency on the export end. For decennial_census_001020 the years it can take were changed from [2000, 2010, 2020] to ["2000", "0812", "1519"]. This parameter is immediately fed into a mapper dictionary so this change was very easy to make.

"pop" variable from census and ACS PUMS re-named

We had run into this problem before where the decennial census has the count of total pop we report on the front-end and the census/ACS PUMS have the total pop we use as a denominator. They were both named "pop" which messes with the merge. We had run into this problem before and I don't remember how we resolved it. My solution here was to rename the pop from ACS/census PUMS to pop_denom. If this isn't best it should be pretty straightforward for me or another engineer to change it.

Miscellaneous changes

Files, functions and workflows that used to read "PUMS_collate" or "PUMS_export" now are "census_collate" or "census_export". The structure where these two categories need their own code hasn't changed but now we have surveys other than PUMS.

SashaWeinstein commented 2 years ago

Thanks for the comments, watching this video on partial functions now

SashaWeinstein commented 2 years ago

@td928 I think you're right that functions are more appropriate for our system of returning accessors.

I'm thinking maybe a single get_accessors function that takes a category name and optional arg of year. If the category is housing prod/QOL/housing security then it returns the associated iterable of accessors. If the category is demographics/economics then it uses to year to return the correct iterable of accessors.

I don't think this process would require partial functions or closures? Do you have an idea of where to implement those?

td928 commented 2 years ago

@td928 I think you're right that functions are more appropriate for our system of returning accessors.

I'm thinking maybe a single get_accessors function that takes a category name and optional arg of year. If the category is housing prod/QOL/housing security then it returns the associated iterable of accessors. If the category is demographics/economics then it uses to year to return the correct iterable of accessors.

I don't think this process would require partial functions or closures? Do you have an idea of where to implement those?

Hey @SashaWeinstein I think you are right that function are sufficient for our purposes. I also don't think this PR requires that refactoring and just ideas for future refactoring if it happens.