Carter-Center-Health-Data-Support-Unit / CC-RB-LF-SCH-DASHBOARD

0 stars 0 forks source link

Input on pre May 2019 RB Rx cleaning #4

Open zacka-cartercenter opened 1 year ago

zacka-cartercenter commented 1 year ago

@sinclairelaina

I've been modifying cleaning and compiling functions to be able to clean the old pre may 2019 data format. While working on these generic functions (in the R directory) I've continued testing the in the data_cleaning.rmd

In the section below the old format data goes through a preliminary cleaning where the names are cleaned based on previous framework (i.e case_whens) , but obviously there are some new issues (especially since we don't have adm1 in the old format.

On line 389 below the "cleaned" data is compared to the admin master list and the non matching values are returned:

https://github.com/Carter-Center-Health-Data-Support-Unit/CC-RB-LF-SCH-DASHBOARD/blob/f3490de35624f2e68f7be7e7446fc886f40db53c/documentation/data_cleaning.Rmd#L346-L399

Therefore the clean_adm2 function case_when statement below needs to be augmented:

https://github.com/Carter-Center-Health-Data-Support-Unit/CC-RB-LF-SCH-DASHBOARD/blob/f3490de35624f2e68f7be7e7446fc886f40db53c/R/reclassify_admins.R#L137-L166

Can you:

esinclairTCC commented 1 year ago

I edited the excel file in the teams chat

zacka-cartercenter commented 1 year ago

@esinclairTCC - I harmonized basically all of the adm2_names in the pre201905 data set in the main branch.

FYI - the harmonization is done here:

https://github.com/Carter-Center-Health-Data-Support-Unit/CC-RB-LF-SCH-DASHBOARD/blob/08c6c73d568f1b8478c11dd8f041947b1913973e/R/reclassify_admins.R#L137-L176

just 2 small issues that maybe you can provide clarity on:

The only remaining "unharmonized" adm2_names in the pre 201905 dataset are "gambella" and "refugges_gambella"

  1. There are 10 records in the pre201905 where adm2_name is "gambella". These records are from Jan 2016 to Oct 2016 and look like real data. As it looks like there was 1 "gambella" adm2 reported per month for those 10 months, can you find out what adm2 in gambela (adm1) they were reporting on in those months?
  2. refugeees_gambella is actually fine, i'll just reclassify same as we did post 201905 (adm2= "refugee" , adm1="gambela")