zcta hospitalizations issue

Drexel-UHC / covid_inequities_project

This repository contains data on COVID-19 and COVID-19 Inequities in BCHC cities. Data are preliminary and subject to change. Information on this page will change as data and documentation are updated.

4 stars 1 forks source link

zcta hospitalizations issue #2

Open alinasmahl1 opened 3 years ago

alinasmahl1 commented 3 years ago

@rl627 Some ZCTA's are missing monthly hospitalization data-- we believe this is because , "The rate of hospitalized cases per 100,000 people is suppressed for a specific geography when the count of hospitalized deaths is between 1 and 4 due to imprecise and unreliable estimates." In order to calculate cumulative rates, we will impute monthly missing.

we also need to update the readme to describe the hosp_per100k column, and include the imputation rule description.

ran-codes commented 3 years ago

The longitudinal rates data was incorrectly aggregated. This has been corrected (https://github.com/Drexel-UHC/covid_inequities_project/blob/main/Outcomes/totals/byZCTA_bchc.csv) ! See the distribution of hosp rates byZCTA for the three cities below. This updated data should now have correct data and no missing ZCTA for NYC. Let me know if things look okay to you.

alinasmahl1 commented 3 years ago

This looks great! What did you end up imputing the missing values to ? Want to make sure I"m giving the NYU folks all the detail on the data.

ran-codes commented 3 years ago

psuedocode: 1) for each month/ZCTA, calculate hospitalization_count from rate*population_denom. If that month/ZCTA was missing a rate (suppression) impute a hospitalization_count of 4 (going with 4 just to prevent underestimation but we can always change that to another number between 1-4). 2) for each ZCTA, calculate cumulative hospitalization_count by summing all the monthly hospitalization_count 3) for each ZCTA, the cumulative rate was calculated by (cumulative hospitalization_count)/pop_denom

I'll update this in the readMe's ETC sometime tomorrow.