Open alinasmahl1 opened 3 years ago
The longitudinal rates data was incorrectly aggregated. This has been corrected (https://github.com/Drexel-UHC/covid_inequities_project/blob/main/Outcomes/totals/byZCTA_bchc.csv) ! See the distribution of hosp rates byZCTA for the three cities below. This updated data should now have correct data and no missing ZCTA for NYC. Let me know if things look okay to you.
This looks great! What did you end up imputing the missing values to ? Want to make sure I"m giving the NYU folks all the detail on the data.
psuedocode: 1) for each month/ZCTA, calculate hospitalization_count from rate*population_denom. If that month/ZCTA was missing a rate (suppression) impute a hospitalization_count of 4 (going with 4 just to prevent underestimation but we can always change that to another number between 1-4). 2) for each ZCTA, calculate cumulative hospitalization_count by summing all the monthly hospitalization_count 3) for each ZCTA, the cumulative rate was calculated by (cumulative hospitalization_count)/pop_denom
I'll update this in the readMe's ETC sometime tomorrow.
@rl627 Some ZCTA's are missing monthly hospitalization data-- we believe this is because , "The rate of hospitalized cases per 100,000 people is suppressed for a specific geography when the count of hospitalized deaths is between 1 and 4 due to imprecise and unreliable estimates." In order to calculate cumulative rates, we will impute monthly missing.
we also need to update the readme to describe the hosp_per100k column, and include the imputation rule description.