Closed aspina7 closed 5 years ago
[Kate][Sitrep - xl.read.file] There is an issue with protected sheets:
linelist_raw <- xl.read.file("hack_data/bituvis-cholera.xlsx", xl.sheet = "xxxxx", password = askpass::askpass("xxxxx")) Error in top_left_corner["CurrentRegion"] : You cannot use this command on a protected sheet. To use this command, you must first unprotect the sheet (Review tab, Changes group, Unprotect Sheet button). You may be prompted for a password. (Microsoft Excel)
80020009
Also note, the askpass field is not particularly intuive. Also need to specify that these packages need to be installed in order to be used.
[Kate] [Sitrep - load packages] Put commented lines with commands for installing (and loading) optional packages like askpass, excel.link, write.xl, etc.
[Kate] [Training] walk people through commenting / uncommenting lines
[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"
[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"
This is solved by updating epitrix and including protect = "#"
[Neale] [Sitrep - read data] Pasting filepaths to rio::import might give an error which required the user to change the slashes from \ to / . (this happened to me)
linelist_raw <- rio::import("C:\Users\Neale\OneDrive - Neale Batra\Documents\Jobs\MSF\Moz_Cholera_LineList_DeID_forTesting.xlsx", which = "Sheet1")
Error: '\U' used without hex digits in character string starting ""C:\U"
edit: @nsbatra, I think this issue may be solved if you use single quotes instead of double quotes? Would you mind checking? -Zhian
Alex: no confirm doesnt work we should probably add comment to templates
[Kate][cholera cleaning] There does not appear to be an ID field in the Cholera dictionary
General: add instructions for setting up RStudio project to link data and project
[Kate] Renaming variables without any order is exquisitely painful
maybe worth reintroducing that function which chucks all the variable names out from data dictionary... which allows you to paste back in to the script ... (alex explanation but you know what i mean)
kate : "i mean trying to determine what the DHIS2 name is versus what i have especially because eg age_days is next to age_years but then age_months is like 4 lines down"
[Maria][ajs_outbreak save_cleaned_data]: writexl needs to be installed first
[Elburg][AJS 313] this is missing a parens: # linelist_cleaned <- select(linelist_cleaned, c(1:3, "age_years", "sex")
IMO, this should be changed to highlight the fact that they can use this method to create temporary data sets.
[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y
Add origin to as.dates and say that for excel windows that it starts as 1899-12-30
[Neale][Cholera - read non-DHIS data] In the commented instructions line 245, update description of data dictionary column names from "Code" to "option_code" and "Name" to "option_name"
[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y
Note: this needed to be in %d-%b-%y
[Kate][Cholera]: in cholera for obs_days, should it use date_of_admission instead of date_of_consultation_admission?
Alex: no date_of_consultation_admission is correct. We actually should delete date_of_admission from the dictionary.
[kate] [general] people dont seem to be reading comments and are then confused about whats happening/where they are. Hopefully training material and the wiki (outlining the structure and content of scripts will help...)
[Annick] [general] I think we should have start lines with stuff like:
(i.e. label different things more clearly!)
[Annick] [training] Note for teaching: the population data setcion isnt really clear - we need to make something that people understand and also an instruction on how to format the population data in the best way.
[Elburg] [variable_naming] for non DHIS2 data, need to clarify how much needs to fit (variables!) in order for it to run. "I think I missed somehting, do I need to rename all the variables so that they match the dictionary?"
[kate] [clean variables - factors] case_when creates a character variable, make sure to say as much in the script comments. Currently says will recode to a factor. Then again we need to re-do all the factor stuff anyway.
[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.
[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.
This is included in the FAQ: https://github.com/R4EPI/sitrep/wiki/4)-FAQ#i-have-a-date-column-and-a-time-column-how-do-i-combine-the-two
Alex: worth adding to training material
[Kate] [cholera - dict] msf dict category options show up wrong e.g. sex shows options for pregnancy trimester
ZNK: this turned out to be irreproducible
Alex: i definitely had the same thing at the time when i checked - but it seems to have resolved itself now.
[Maria] [sitrep - read data] Let's consider putting the code for generating the dummy dataset before the code for reading in your own dataset, so that if the user forgets to comment/delete the gen_data line the script will still use the data they import.
Yeah agree - maybe we should put it in a seperate chunk alltogether
[Kate] Move reporting week to preamble
[Annick]: use knitr params
[Kate]: Not clear that sex/gender needs to be a factor
[kate] [data cleaning] doesnt like that you are filtering and then not assigning to a new new data set, i.e. have linelist_raw, linelist_clean and then when filtering assign to e.g. linelist_analysis. All came about because of mistake when passing arguments to filter command which left 0 cases in dataset. New people would justifiably freak out! Maybe another thing for training??
[Alanah] [recode_factors] "" = NAcharacter when actually the missings where " ". Error returns zero length variable... not super intuitive to understand whats wrong...
[Elburg] [general]: One of my parting thoughts is idd when you give examples to keep them in line with the rest of the script. I followed the renaming example of sex -> gender and now have to change sex to gender every where in the scripts
[Kate][General]: rio::import()
converts files to UTF-8, but readr::read_csv()
does not, which causes problems for cleaning functions that expect utf8
[Ettiene] [Install] Admin rights issue, cant install package - also cant change .rprofile to change where packages are written to. This had the error from sf that it couldn't find the object groupmap (but in French, so ¯\_(ツ)\/¯) Eventually resolved itself by installing each of packages again.
[Isidro] [installing] Add a note to restart R before installing new packages
[Anna] [renaming variables] easier way to list out all the variable names in datasets rather than clicking back and forth.
[Ettiene] [reading_data] importing Excel data from a OCG linelist, the data do not iatart n the first row first column, there are in a specific range, Add a walkthrough of how to read in specific cell range. linelist_raw <- rio::import(chemin, which = "Data", range="E12:AD3941")
[Pat] [reading_dhis_excel_data] wall of text dense. Considering breaking up in to smaller chunks. consider also not having code commented out for the alternative options because its hard to find it inbetween actual commented text. Instead consider breaking in to small chunks and have alternative options in seperate chunks where can turn eval = TRUE / FALSE - and provide instructions on that
[Anna] [renaming variables] measles reading non-dhis2 data, the renmaing examples dont match the dictionary so is confusing.
[Kate/Isidro] [population data] considering adding the option to also type in population counts (not props) e.g. for village wouldnt necessarily have proportions. But still useful to have proportions for age group breakdowns. (and to show how to read in counts from an excel)
Also for age group break downs fix for all the templates! (alex fucked up all the proportion counting) All ages 100.00% 0 - 4 y 15.89% 5 - 14 y 26.78% 15 - 29 y 27.72% 30 - 44 y 16.28% >= 45 y 13.33% Total U5 15.89% 0 - 11 m 3.29% 12 - 23 m 3.29% 24 - 35 m 3.10% 36 - 47 m 3.10% 48 - 59 m 3.10% see annick email titled population denominator tool. (double check neale also gets this)
Also for all disease templates make more realistic age group examples - e.g. for measles vaccination is up to 59 months, then 10 years and 14 years. [DISCUSS WITH ANNICK!]
[Kate] fmt_counts in cholera treatment plans if there are zero counts in there then returns character(0) - unsure wether fmt_count will fix this automatically and output 0 (0%) in the word doc. May need to add to fmt_count that returns 0 if as.character comes back.
[kate] [cholera obs_days] Cholera -- Median, min, and max days_obs need an na.rm = T added
[Kate] [cholera CFR] - add an option to turn off CIs in cfr calculation - for eg if it is a closed population and are certain that all deaths are being captured (i.e. inpatients). If making an assumption that those deaths are representative of community-wide deaths then need to have CIs.
[Pat] [fixing dates] set unrealistic dates to NA, based on having browsed dates in the previous chunk.
Need to set this ~NA
to as.Date(NA)
## set unrealistic dates to NA, based on having browsed dates in the previous chunk
linelist_cleaned <- mutate(linelist_cleaned,
date_of_onset < as.Date("2017-11-01") ~ NA,
date_of_onset == as.Date("2081-01-01") ~ as.Date("2018-01-01"))
otherwise get
Error: Column date_of_onset < as.Date("2017-11-01") ~ NA
is of unsupported type quoted call
[kate] [cholera Attacke rate] overall and by age group - different multiplier. Overall is by 10,000 and age group by 100,000. Change all to be 10,000 - double check in other templates to see if same.
[Elburg][symptoms]. There was an issue with symptoms in which Elburg had data that had various calls for yes and no. This needed to be cleaned. I've templated a small example
library("sitrep")
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- data.frame(
tidy_symptoms = sample(c("Yes", "No", "Unknown"), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
num_symptoms = sample(c(0, 1, NA), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
mixed_symptoms = sample(c("Yes", "No", "Unknown", 1, 0), 100, replace = TRUE, prob = c(0.3, 0.6, 0.05, 0.025, 0.025))
)
NAMES <- colnames(dat)
sitrep::multi_descriptive(dat, NAMES)
#> converting numeric variable to factor
#> # A tibble: 3 x 25
#> symptom No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidy_s… 50 50 100 100 11 11 39
#> 2 num_sy… NA NA 100 100 NA NA NA
#> 3 mixed_… 57 57.0 100 100 6 6 35
#> # … with 17 more variables: Yes_prop <dbl>, `(0,0.2]_n` <dbl>,
#> # `(0,0.2]_prop` <dbl>, `(0.2,0.4]_n` <dbl>, `(0.2,0.4]_prop` <dbl>,
#> # `(0.4,0.6]_n` <dbl>, `(0.4,0.6]_prop` <dbl>, `(0.6,0.8]_n` <dbl>,
#> # `(0.6,0.8]_prop` <dbl>, `(0.8,1]_n` <dbl>, `(0.8,1]_prop` <dbl>,
#> # Missing_n <dbl>, Missing_prop <dbl>, `0_n` <dbl>, `0_prop` <dbl>,
#> # `1_n` <dbl>, `1_prop` <dbl>
dat2 <- dat %>%
mutate_at(vars(NAMES), ~case_when(
. == "Yes" ~ "Yes",
. == "y" ~ "Yes",
. == "Y" ~ "Yes",
. == "No" ~ "No",
. == "N" ~ "No",
. == "n" ~ "No",
. == 1 ~ "Yes",
. == 0 ~ "No",
TRUE ~ "Unknown"
))
sitrep::multi_descriptive(dat2, NAMES)
#> # A tibble: 3 x 9
#> symptom No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidy_s… 50 50 100 100 11 11 39
#> 2 num_sy… 22 22 100 100 6 6 72
#> 3 mixed_… 58 58.0 100 100 6 6 36
#> # … with 1 more variable: Yes_prop <dbl>
Created on 2019-07-05 by the reprex package (v0.3.0)
[kate] [cholera mortality rate] - with zero deaths then the table comes out with deaths - and population - and CIs of NA-NA .... considering adding to function to have 0 come out rather than - and NA
DeathsPopulationMortality (per 10,000)95%CI
SAME for the mortality_rate_region section in cholera
mortality_rate(deaths$deaths, deaths$population, multiplier = 10000) %>%
# add the region column to table
bind_cols(select(deaths, zone_sante), .) %>%
merge_ci_df(e = 4) %>% # merge the lower and upper CI into one column
rename("Region" = zone_sante,
"Deaths" = deaths,
"Population" = population,
"Mortality (per 10,000)" = `mortality per 10 000`,
"95%CI" = ci) %>%
kable(digits = 2)
[Elburg] [AJS lab tests] did not have all the lab variables listed in LABS - add an explanation to comment out the ones dont have. Also if comment out the last one in the list then ahve an extra comma which throws an error and need to add a NULL (or just delete the comma). Alternatively Zhian consider adding to multi_descriptive function to drop non existant variables and return a warning message that those variables have been ignored.
[Elburg][AJS lab tests] Same as https://github.com/R4EPI/sitrep/issues/136#issuecomment-508784556, she didn't have the expected variables and so select and rename were failing.
[Kate] [loading packages] here::here package is not loaded ... but used in the spatial data section. Also need to add explanation of how here works.
[Anna] [standardise_clean filtering] a lot of datasets will fill case_id down automatically even if nothing else is filled in yet. Show example code at the filter bit to show how to drop
[Kate] [reading shapefiles] actually read_sf does need to be told .shp at the end! i thought it recognises automatically from name
[kate/Annick] [maps] change colour palletes - dark for highest AR
[Elburg] [epicurves] show an example of how to change the x axis labels in epicurves. Because for example with loads of data is then uuuugly.
Morning @zkamvar - Think probably the easiest thing to do is to start a bullet list, then for each bullet sit it out so that it contains the name of person who raised issue in brackets, then the disease template (cholera, ajs, measles, meningitis) they were using, followed by a dash and the codechunk (or general) and then the error or issue. followed by solution in double brackets, at the end of it once fixed. e.g.: