R4EPI / sitrep

Report templates and helper functions for applied epidemiology
https://r4epi.github.io/sitrep/
GNU General Public License v3.0
40 stars 14 forks source link

Hackathon Issues #136

Closed aspina7 closed 5 years ago

aspina7 commented 5 years ago

Morning @zkamvar - Think probably the easiest thing to do is to start a bullet list, then for each bullet sit it out so that it contains the name of person who raised issue in brackets, then the disease template (cholera, ajs, measles, meningitis) they were using, followed by a dash and the codechunk (or general) and then the error or issue. followed by solution in double brackets, at the end of it once fixed. e.g.:

zkamvar commented 5 years ago

[Kate][Sitrep - xl.read.file] There is an issue with protected sheets:

linelist_raw <- xl.read.file("hack_data/bituvis-cholera.xlsx", xl.sheet = "xxxxx", password = askpass::askpass("xxxxx")) Error in top_left_corner["CurrentRegion"] : You cannot use this command on a protected sheet. To use this command, you must first unprotect the sheet (Review tab, Changes group, Unprotect Sheet button). You may be prompted for a password. (Microsoft Excel) 80020009

Also note, the askpass field is not particularly intuive. Also need to specify that these packages need to be installed in order to be used.

nsbatra commented 5 years ago

[Kate] [Sitrep - load packages] Put commented lines with commands for installing (and loading) optional packages like askpass, excel.link, write.xl, etc.

aspina7 commented 5 years ago

[Kate] [Training] walk people through commenting / uncommenting lines

aspina7 commented 5 years ago

[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"

zkamvar commented 5 years ago

[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"

This is solved by updating epitrix and including protect = "#"

nsbatra commented 5 years ago

[Neale] [Sitrep - read data] Pasting filepaths to rio::import might give an error which required the user to change the slashes from \ to / . (this happened to me)

linelist_raw <- rio::import("C:\Users\Neale\OneDrive - Neale Batra\Documents\Jobs\MSF\Moz_Cholera_LineList_DeID_forTesting.xlsx", which = "Sheet1")

Error: '\U' used without hex digits in character string starting ""C:\U"

edit: @nsbatra, I think this issue may be solved if you use single quotes instead of double quotes? Would you mind checking? -Zhian

Alex: no confirm doesnt work we should probably add comment to templates

zkamvar commented 5 years ago

[Kate][cholera cleaning] There does not appear to be an ID field in the Cholera dictionary

zkamvar commented 5 years ago

General: add instructions for setting up RStudio project to link data and project

zkamvar commented 5 years ago

[Kate] Renaming variables without any order is exquisitely painful

maybe worth reintroducing that function which chucks all the variable names out from data dictionary... which allows you to paste back in to the script ... (alex explanation but you know what i mean)

kate : "i mean trying to determine what the DHIS2 name is versus what i have especially because eg age_days is next to age_years but then age_months is like 4 lines down"

zkamvar commented 5 years ago

[Maria][ajs_outbreak save_cleaned_data]: writexl needs to be installed first

zkamvar commented 5 years ago

[Elburg][AJS 313] this is missing a parens: # linelist_cleaned <- select(linelist_cleaned, c(1:3, "age_years", "sex")

IMO, this should be changed to highlight the fact that they can use this method to create temporary data sets.

aspina7 commented 5 years ago

[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y

Add origin to as.dates and say that for excel windows that it starts as 1899-12-30

nsbatra commented 5 years ago

[Neale][Cholera - read non-DHIS data] In the commented instructions line 245, update description of data dictionary column names from "Code" to "option_code" and "Name" to "option_name"

zkamvar commented 5 years ago

[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y

Note: this needed to be in %d-%b-%y

zkamvar commented 5 years ago

[Kate][Cholera]: in cholera for obs_days, should it use date_of_admission instead of date_of_consultation_admission?

Alex: no date_of_consultation_admission is correct. We actually should delete date_of_admission from the dictionary.

aspina7 commented 5 years ago

[kate] [general] people dont seem to be reading comments and are then confused about whats happening/where they are. Hopefully training material and the wiki (outlining the structure and content of scripts will help...)

[Annick] [general] I think we should have start lines with stuff like:

INSTRUCTIONS

CODE OPTION

(i.e. label different things more clearly!)

aspina7 commented 5 years ago

[Annick] [training] Note for teaching: the population data setcion isnt really clear - we need to make something that people understand and also an instruction on how to format the population data in the best way.

aspina7 commented 5 years ago

[Elburg] [variable_naming] for non DHIS2 data, need to clarify how much needs to fit (variables!) in order for it to run. "I think I missed somehting, do I need to rename all the variables so that they match the dictionary?"

aspina7 commented 5 years ago

[kate] [clean variables - factors] case_when creates a character variable, make sure to say as much in the script comments. Currently says will recode to a factor. Then again we need to re-do all the factor stuff anyway.

nsbatra commented 5 years ago

[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.

zkamvar commented 5 years ago

[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.

This is included in the FAQ: https://github.com/R4EPI/sitrep/wiki/4)-FAQ#i-have-a-date-column-and-a-time-column-how-do-i-combine-the-two

Alex: worth adding to training material

aspina7 commented 5 years ago

[Kate] [cholera - dict] msf dict category options show up wrong e.g. sex shows options for pregnancy trimester

ZNK: this turned out to be irreproducible

Alex: i definitely had the same thing at the time when i checked - but it seems to have resolved itself now.

nsbatra commented 5 years ago

[Maria] [sitrep - read data] Let's consider putting the code for generating the dummy dataset before the code for reading in your own dataset, so that if the user forgets to comment/delete the gen_data line the script will still use the data they import.

Yeah agree - maybe we should put it in a seperate chunk alltogether

zkamvar commented 5 years ago

[Kate] Move reporting week to preamble

[Annick]: use knitr params

zkamvar commented 5 years ago

[Kate]: Not clear that sex/gender needs to be a factor

aspina7 commented 5 years ago

[kate] [data cleaning] doesnt like that you are filtering and then not assigning to a new new data set, i.e. have linelist_raw, linelist_clean and then when filtering assign to e.g. linelist_analysis. All came about because of mistake when passing arguments to filter command which left 0 cases in dataset. New people would justifiably freak out! Maybe another thing for training??

aspina7 commented 5 years ago

[Alanah] [recode_factors] "" = NAcharacter when actually the missings where " ". Error returns zero length variable... not super intuitive to understand whats wrong...

zkamvar commented 5 years ago

[Elburg] [general]: One of my parting thoughts is idd when you give examples to keep them in line with the rest of the script. I followed the renaming example of sex -> gender and now have to change sex to gender every where in the scripts

zkamvar commented 5 years ago

[Kate][General]: rio::import() converts files to UTF-8, but readr::read_csv() does not, which causes problems for cleaning functions that expect utf8

aspina7 commented 5 years ago

[Ettiene] [Install] Admin rights issue, cant install package - also cant change .rprofile to change where packages are written to. This had the error from sf that it couldn't find the object groupmap (but in French, so ¯\_(ツ)\/¯) Eventually resolved itself by installing each of packages again.

aspina7 commented 5 years ago

[Isidro] [installing] Add a note to restart R before installing new packages

aspina7 commented 5 years ago

[Anna] [renaming variables] easier way to list out all the variable names in datasets rather than clicking back and forth.

aspina7 commented 5 years ago

[Ettiene] [reading_data] importing Excel data from a OCG linelist, the data do not iatart n the first row first column, there are in a specific range, Add a walkthrough of how to read in specific cell range. linelist_raw <- rio::import(chemin, which = "Data", range="E12:AD3941")

aspina7 commented 5 years ago

[Pat] [reading_dhis_excel_data] wall of text dense. Considering breaking up in to smaller chunks. consider also not having code commented out for the alternative options because its hard to find it inbetween actual commented text. Instead consider breaking in to small chunks and have alternative options in seperate chunks where can turn eval = TRUE / FALSE - and provide instructions on that

aspina7 commented 5 years ago

[Anna] [renaming variables] measles reading non-dhis2 data, the renmaing examples dont match the dictionary so is confusing.

aspina7 commented 5 years ago

[Kate/Isidro] [population data] considering adding the option to also type in population counts (not props) e.g. for village wouldnt necessarily have proportions. But still useful to have proportions for age group breakdowns. (and to show how to read in counts from an excel)

Also for age group break downs fix for all the templates! (alex fucked up all the proportion counting) All ages 100.00% 0 - 4 y 15.89% 5 - 14 y 26.78% 15 - 29 y 27.72% 30 - 44 y 16.28% >= 45 y 13.33% Total U5 15.89% 0 - 11 m 3.29% 12 - 23 m 3.29% 24 - 35 m 3.10% 36 - 47 m 3.10% 48 - 59 m 3.10% see annick email titled population denominator tool. (double check neale also gets this)

Also for all disease templates make more realistic age group examples - e.g. for measles vaccination is up to 59 months, then 10 years and 14 years. [DISCUSS WITH ANNICK!]

aspina7 commented 5 years ago

[Kate] fmt_counts in cholera treatment plans if there are zero counts in there then returns character(0) - unsure wether fmt_count will fix this automatically and output 0 (0%) in the word doc. May need to add to fmt_count that returns 0 if as.character comes back.

aspina7 commented 5 years ago

[kate] [cholera obs_days] Cholera -- Median, min, and max days_obs need an na.rm = T added

aspina7 commented 5 years ago

[Kate] [cholera CFR] - add an option to turn off CIs in cfr calculation - for eg if it is a closed population and are certain that all deaths are being captured (i.e. inpatients). If making an assumption that those deaths are representative of community-wide deaths then need to have CIs.

aspina7 commented 5 years ago

[Pat] [fixing dates] set unrealistic dates to NA, based on having browsed dates in the previous chunk. Need to set this ~NA to as.Date(NA)

## set unrealistic dates to NA, based on having browsed dates in the previous chunk
  linelist_cleaned <- mutate(linelist_cleaned,
                             date_of_onset < as.Date("2017-11-01") ~ NA, 
                             date_of_onset == as.Date("2081-01-01") ~ as.Date("2018-01-01"))

otherwise get Error: Column date_of_onset < as.Date("2017-11-01") ~ NA is of unsupported type quoted call

aspina7 commented 5 years ago

[kate] [cholera Attacke rate] overall and by age group - different multiplier. Overall is by 10,000 and age group by 100,000. Change all to be 10,000 - double check in other templates to see if same.

zkamvar commented 5 years ago

[Elburg][symptoms]. There was an issue with symptoms in which Elburg had data that had various calls for yes and no. This needed to be cleaned. I've templated a small example

  library("sitrep")
  library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dat <- data.frame(
  tidy_symptoms = sample(c("Yes", "No", "Unknown"), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
  num_symptoms = sample(c(0, 1, NA), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
  mixed_symptoms = sample(c("Yes", "No", "Unknown", 1, 0), 100, replace = TRUE, prob = c(0.3, 0.6, 0.05, 0.025, 0.025))
)

NAMES <- colnames(dat)
sitrep::multi_descriptive(dat, NAMES)
#> converting numeric variable to factor
#> # A tibble: 3 x 25
#>   symptom  No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#>   <chr>   <dbl>   <dbl>   <dbl>      <dbl>     <dbl>        <dbl> <dbl>
#> 1 tidy_s…    50    50       100        100        11           11    39
#> 2 num_sy…    NA    NA       100        100        NA           NA    NA
#> 3 mixed_…    57    57.0     100        100         6            6    35
#> # … with 17 more variables: Yes_prop <dbl>, `(0,0.2]_n` <dbl>,
#> #   `(0,0.2]_prop` <dbl>, `(0.2,0.4]_n` <dbl>, `(0.2,0.4]_prop` <dbl>,
#> #   `(0.4,0.6]_n` <dbl>, `(0.4,0.6]_prop` <dbl>, `(0.6,0.8]_n` <dbl>,
#> #   `(0.6,0.8]_prop` <dbl>, `(0.8,1]_n` <dbl>, `(0.8,1]_prop` <dbl>,
#> #   Missing_n <dbl>, Missing_prop <dbl>, `0_n` <dbl>, `0_prop` <dbl>,
#> #   `1_n` <dbl>, `1_prop` <dbl>
dat2 <- dat %>%
  mutate_at(vars(NAMES), ~case_when(
    . == "Yes" ~ "Yes",
    . == "y"   ~ "Yes",
    . == "Y"   ~ "Yes",
    . == "No"  ~ "No",
    . == "N"   ~ "No",
    . == "n"   ~ "No",
    . == 1     ~ "Yes",
    . == 0     ~ "No",
    TRUE       ~ "Unknown"
    ))
sitrep::multi_descriptive(dat2, NAMES)
#> # A tibble: 3 x 9
#>   symptom  No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#>   <chr>   <dbl>   <dbl>   <dbl>      <dbl>     <dbl>        <dbl> <dbl>
#> 1 tidy_s…    50    50       100        100        11           11    39
#> 2 num_sy…    22    22       100        100         6            6    72
#> 3 mixed_…    58    58.0     100        100         6            6    36
#> # … with 1 more variable: Yes_prop <dbl>

Created on 2019-07-05 by the reprex package (v0.3.0)

aspina7 commented 5 years ago

[kate] [cholera mortality rate] - with zero deaths then the table comes out with deaths - and population - and CIs of NA-NA .... considering adding to function to have 0 come out rather than - and NA

DeathsPopulationMortality (per 10,000)95%CI

SAME for the mortality_rate_region section in cholera

mortality_rate(deaths$deaths, deaths$population, multiplier = 10000) %>%
  # add the region column to table
  bind_cols(select(deaths, zone_sante), .) %>% 
  merge_ci_df(e = 4) %>% # merge the lower and upper CI into one column
  rename("Region" = zone_sante, 
         "Deaths" = deaths, 
         "Population" = population, 
         "Mortality (per 10,000)" = `mortality per 10 000`, 
         "95%CI" = ci) %>% 
  kable(digits = 2)
aspina7 commented 5 years ago

[Elburg] [AJS lab tests] did not have all the lab variables listed in LABS - add an explanation to comment out the ones dont have. Also if comment out the last one in the list then ahve an extra comma which throws an error and need to add a NULL (or just delete the comma). Alternatively Zhian consider adding to multi_descriptive function to drop non existant variables and return a warning message that those variables have been ignored.

zkamvar commented 5 years ago

[Elburg][AJS lab tests] Same as https://github.com/R4EPI/sitrep/issues/136#issuecomment-508784556, she didn't have the expected variables and so select and rename were failing.

aspina7 commented 5 years ago

[Kate] [loading packages] here::here package is not loaded ... but used in the spatial data section. Also need to add explanation of how here works.

aspina7 commented 5 years ago

[Anna] [standardise_clean filtering] a lot of datasets will fill case_id down automatically even if nothing else is filled in yet. Show example code at the filter bit to show how to drop

aspina7 commented 5 years ago

[Kate] [reading shapefiles] actually read_sf does need to be told .shp at the end! i thought it recognises automatically from name

aspina7 commented 5 years ago

[kate/Annick] [maps] change colour palletes - dark for highest AR

aspina7 commented 5 years ago

[Elburg] [epicurves] show an example of how to change the x axis labels in epicurves. Because for example with loads of data is then uuuugly.