SashaWeinstein commented 2 years ago

This work is started today because Te is putting in terrific work on household income related indicators and the aggregations by borough and citywide are working too. There is still more to do on QAQC but getting a data flow to digital services is the higher priority.

There are two parts to developing an export process.

1. Caching aggregated results

There is a cache flat file method in the parent class of the PUMS aggregators, it should be called always be in the init after the aggregation is done.

2. Iterating through category - geography - year

I will first work on getting 2008-12 and 2015-19 collated and exported and worry about incorporating Erica's data for 2000 and the decennial census later. This means the total population by race cols will be incorrect as they come from ACS PUMS instead of the census. I think this is ok for now, they will be replaced by Erica's data when we get around to that

SashaWeinstein commented 2 years ago

It should also not be running the exporting workflow on each push, need to change this to only run on certain pushes. Think you can run with a particular commit message in the push

SashaWeinstein commented 2 years ago

Check that correct number of columns are produced for demographics

I got a file for the demographic category which came out to 270 total columns. I wanted to document how I checked that I got the correct number of columns.

Counts/Fractions

For indicators measured by counts/fractions there are seven columns per category of an indicator. They are

count estimate
count CV
count MOE
fraction estimate
fraction CV
fraction MOE
denominator Then each category is cross-tabbed by race which adds another 5 categories. So 6 categories7 columns per indicator category. Total pop, limited english proficiency and foreign born each have one category so they account for 342 = 126 columns. Then age buckets have 3 categories (under 16, 16-64, 65+) which adds another 3*42 columns which comes out to 252 for counts/fraction

Medians

We only have median age. Here there are just 3 columns per indicator

median estimate
median CV
median MOE Age contributes 3 columns and each of it's 5 race cross-tabs contribute another 3. 18 total from medians

Correct 💯

This comes out to 270 total

SashaWeinstein commented 2 years ago

Important to-do: don't need to add external review .csvs to github

SashaWeinstein commented 2 years ago

I got this error on the r-lib/actions/setup-r@v1 step:

Error: Failed to get R 4.1.2: Failed to get R 4.1.2: Failed to install R: Error: The process '/usr/bin/sudo' failed with exit code 100

I couldn't find anything helpful on google so I ran it again and it worked

NYCPlanning / db-equitable-development-tool

Export Aggregated PUMS in Demographics and Household Economic Security #141

1. Caching aggregated results

2. Iterating through category - geography - year

Check that correct number of columns are produced for demographics

Counts/Fractions

Medians

Correct 💯