NYCPlanning / db-equitable-development-tool

Data Repo for the equitable development tool (EDDT)
MIT License
0 stars 0 forks source link

ACS PUMS economics from DCP population #252

Closed SashaWeinstein closed 2 years ago

SashaWeinstein commented 2 years ago

Economic indicators for 2008-2012 and 2015-2019

Data comes from DCP population in spreadsheets formatted in similar way to their data for demographics. We have a pretty well-established process for transforming this type of data.

My code to rename cols is slightly different than what's in the demographics code. I wrote a convert_col_label script that gets called for each column label. I also wrote new mappers as the existing ones either include underscores or are specific to certain years. I prefer to add underscores via the logic in the function rather than get them from a mapper

SashaWeinstein commented 2 years ago

Add medians and Re-order columns

I opened up this PR when the work was 50% done, that was my bad.

Add medians

Adding the medians was pretty simple, median wages have leading "MW" in indicator label. Curious that source data doesn't have median wages for occupations crosstabbed by race. It does have median wages by industry crosstabbed by race.

Re-order coumns

The existing order_aggregated_columns worked for the count indicators. It's not exactly right as we would rather have each occupation act as an indicator rather than a category. The way this orders the columns all occupations are together and then all crosstabs on all occupations follow. It would be better to put each occupation crosstab next to the numbers for all races of that occupation but it's ok for now.

I had to write new code for the medians re-order as one indicator is not crosstabbed by race which is an unusual pattern.

I checked the math for the expected number of columns and have assertions that all cols are preserved when we reindex.

Future work

Lots of good refactoring to do for the reordering columns and dcp population processing in general. The assertions to make sure columns aren't dropped is a good QAQC measure.

Unsure

I set the indicator label for income bands as households_<income band>. Easy to change if we want something different

SashaWeinstein commented 2 years ago

Most recent commit addresses both your comments @mbh329 and @td928

mbh329 commented 2 years ago

I think this all looks good - will wait for @AmandaDoyle on the column names for the edu_ indicators to approve

td928 commented 2 years ago

Hey Sasha immediately after approval I remember I had this comment from the other branch about renaming the accessor function. Just pasted below for your reference:

From YOU

I think for this accessor it may be ok to break the pattern? pums_0812_1519_demographics is an unwieldy function name, for demographics and economics from ACS PUMS it's ok for it to be acspums. Mine is ACS_PUMS though so that should be standardized between the two functions, either both uppercase or both lowercase

Member Author @td928 td928 2 hours ago agree on all points and my vote for lowercase because I dislike capitalized functions lol

SashaWeinstein commented 2 years ago

Hey Te what comments?

td928 commented 2 years ago

sorry just updated above

SashaWeinstein commented 2 years ago

sorry I still don't understand lol. Good to merge?

td928 commented 2 years ago

this is just awful fumble from me but I was talking about this thread https://github.com/NYCPlanning/db-equitable-development-tool/pull/252#issuecomment-1082182028