Closed aspina7 closed 5 years ago
[sitrep - chunk layout] consider breaking up code in to smaller chunks. Also consider not having code commented out (hard for overview) - instead have chunks called "*** ALTERNATIVE OPTION" (where can turn on and off eval = TRUE/FALSE). comment
Do we want to do this? Seems like it would be even more confusing than commenting...in some places, you uncomment, in others, you add TRUE/FALSE.
Fix code first and then later discuss breaking up and reorganising.
On Tue, 6 Aug 2019, 20:25 Kate Doyle, notifications@github.com wrote:
[sitrep - chunk layout] consider breaking up code in to smaller chunks. Also consider not having code commented out (hard for overview) - instead have chunks called "*** ALTERNATIVE OPTION" (where can turn on and off eval = TRUE/FALSE). comment
Do we want to do this? Seems like it would be even more confusing than commenting...in some places, you uncomment, in others, you add TRUE/FALSE.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/R4EPI/sitrep/issues/141?email_source=notifications&email_token=AEL7IJ7J2VDJ5E6RDEFZ4F3QDG623A5CNFSM4H7NWNZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3WBHQI#issuecomment-518788033, or mute the thread https://github.com/notifications/unsubscribe-auth/AEL7IJ7BY62A5XPXGBNLMGDQDG623ANCNFSM4H7NWNZQ .
[Sitrep - epitrix::clean_colnames] Need to add explanations of how to protect columns from being removed, using e.g. `protect = "#"´comment.
Leaving this out as I think this was an edge case (in that I raised the issue) and will just add more confusion.
[Sitrep - renaming vars] making non-DHIS datasets fit is very painful. Add explanation of sitrep::msf_dict_rename_helper() - also add to this function to say whether those variables are OPTIONAL or NECESSARY for the template to run. Consider also changing variables to alphabetic order. Consider integrating the codebook package
Moved to function issues.
[sitrep epicurves] add example of how to filter (drop) years (levels) using forcats::fct_drop() comment
To do this, you actually need to do when filtering your data, as you can't do it once you've got the epiweek as factors. Would say this is an edge case and we don't need to include.
[Sitrep - epitrix::clean_colnames] Need to add explanations of how to protect columns from being removed, using e.g. `protect = "#"´comment.
Leaving this out as I think this was an edge case (in that I raised the issue) and will just add more confusion.
agree - pretty niche. Also as moving towards dictionaries this shouldnt be an issue (he said naiively, comentator: it was going to be an issue). Maybe @nsbatra a thing for training? or common errors/faqs? Or we just ignore it entirely and deal with it when it comes up ...
[sitrep epicurves] add example of how to filter (drop) years (levels) using forcats::fct_drop() comment
To do this, you actually need to do when filtering your data, as you can't do it once you've got the epiweek as factors. Would say this is an edge case and we don't need to include.
Maybe another thing for training? @nsbatra - how to filter out previous years, or drop factor levels?
[updating sitrep] Add to FAQ that you may need to restart R when installing or updating (issue of removing prior installations and removing previously present general outbreak template). You may also need to install each of the packages one-by-one if it gets stuck. comment comment 2 comment 3
@nsbatra not sure if this is already on the r4epis site in the intro/getting started bit - if not could you chuck it in there plase?
[sitrep::gen_data] sometimes genertes explicit missing variable for gender and other times not (potentially depending on disease??) comment
@zkamvar cant seem to replicate this - maybe its an issue from linelist::clean_variable_spelling?
[Sitrep::gen_population] fix the example props used in templates using the excel file with standardised agegroupings / props that annick sent. Generally discuss with annick what age groupings should be used in templates. comments. Cross reference with gen_population point from functions heading.
@kdoyle514 - added an extra bullet point here. We need to fix the age groups in the templates. Will change the defaults in gen_data too. Forwarding you the excel sheet now
[updating sitrep] Add to FAQ that you may need to restart R when installing or updating (issue of removing prior installations and removing previously present general outbreak template). You may also need to install each of the packages one-by-one if it gets stuck. comment comment 2 comment 3
@nsbatra not sure if this is already on the r4epis site in the intro/getting started bit - if not could you chuck it in there plase?
Sorry, closed this before I meant to. @aspina7 I added this content to the FAQ page.
[sitrep epicurves] add example of how to filter (drop) years (levels) using forcats::fct_drop() comment
To do this, you actually need to do when filtering your data, as you can't do it once you've got the epiweek as factors. Would say this is an edge case and we don't need to include.
Maybe another thing for training? @nsbatra - how to filter out previous years, or drop factor levels?
@aspina7 I added a section on factors and fct_drop into the Advanced Functions subpage of R Basics. I am also going to talk through the epicurve years issue in the walk-through.
[Sitrep - epitrix::clean_colnames] Need to add explanations of how to protect columns from being removed, using e.g. `protect = "#"´comment.
Leaving this out as I think this was an edge case (in that I raised the issue) and will just add more confusion.
agree - pretty niche. Also as moving towards dictionaries this shouldnt be an issue (he said naiively, comentator: it was going to be an issue). Maybe @nsbatra a thing for training? or common errors/faqs? Or we just ignore it entirely and deal with it when it comes up ...
@aspina7 I have added a note about this in the walk-through and will add it to FAQ page as well.
closing this as we have finished everything within the scope - leftovers are in nice-to-haves
I have attempted here to structure all the comments from the hackthon in to tasks. It is broken down in to categories which include: Training Function Issues Template structure Installing / updating R, R studio, Sitrep & packages Data Cleaning Data Analysis
Will try and have disease and chunk in square brackets (or function/action where appropriate) at beginning of bullet point and then link to the comment at end. If it generally applies to all templates then instead of disease will write sitrep.
Will mark the hackthon issues list with googly-eyes emoji once have included comment here.
Training / wiki
[x] [general] Need to explain what commented lines are (and ctrl+shift+c command for commenting/uncommenting) comment
[x] [general] add instructions for setting up Rstudio project to link data and project (include folder structure suggestion) comment
[x] [general] working directories, file paths and the here package comment comment 2
[x] [objects / subsetting] how to assign dataframes and subsetting (when you should subset and overwrite vs creating a new object) comment
[x] [variable names & data dictionary] Introduction to how to look at names of a data frame, and how to view names from a dictionary (ie. data_element_shortname). Also an introduction to the dictionaries, and a graphic overview (maybe just a print out of the dictionary itself) - consider looking in to the codebook package. comment
[x] [data cleaning - common errors] for example spaces and commas. "" versus " " - the dplyr error message not super intuitive "Error returns zero length variable". (explain why) comment
[x] [data cleaning - dates] explanation of different date formats and origins from excel. comment
[x] [data cleaning - dates] walk-through of combining a date and time column (already covered in the Wiki) comment
[x] [population data] Need explanation of how to have population data formated if want to read in. Also need to have training section on gen_population function (counts vs props, strata etc). comment
[x] [rmarkdown] introduciton on how to use - e.g. text vs code, echo and eval, the ability to push the little downward arrow on top right of chunks to re-run code up to that point, that way if make a mistake can start over quickly. comment comment 2
[x] [flowchart] good to have a visual guide to walk through where users need to go in templates depending on their data starting point. (part of this should be the wiki disease specific pages). comment
Function issues
[x] [sitrep::gen_data] sometimes genertes explicit missing variable for gender and other times not (potentially depending on disease?? Cant be replicated, must be fixed...) comment
[x] [sitrep::gen_data] add an ID field to each of the data dictionaries comment
[x] [sitrep::gen_data] remove date_of_admission from all outbreak data dictionaries. Only date_of_consultation_admission should be there (we got sent an older version of data dictionaries). When filtering for inpatients this will then be the date of admission, otherwise consultation. comment
[x] ~~ [sitrep::msf_dict] consider a simpler way to have an overview of vriable value options - clicking in to each one defeats the purpose of reading in a dictionary (should be able to have dictionary open and look things up while recoding). Also double check that the value options that are produced are correct (though this seems to be have been a one-time issue. comment ~~ [Moved to nice-to-haves issue post
[x] [sitrep::gen_population] Add option to type in counts rather than props. (still keep props option though). Also need to fix default props. comments Cross reference with gen_population point from read_data heading below.
[x] [sitrep::fmt_counts] if ther are zero counts then returns character(0) - double check that this comes out correctly in word docs - may need to add argument to fmt_count so that if zero returns 0 (0%). comment
[x] [sitrep::mortality_rate] if there are zero deaths then consider add 0 rather than NA to the output comment
[x]
[sitrep::case_fatality_rate] add option to turn off CIs. (for situations where convinced capturing all deaths accurately).No point doing because can just drop the columns comment[x] [sitrep:: table functions] zhian has some minor edits left to do on tab_* functions - comments in kd house keeping pr. Originally discussed in issue
[x] [sitrep::table functions] consider dropping non-existant variables and return a warning message these have been ignored comment comment 2
[x]
[Sitrep - renaming vars] making non-DHIS datasets fit is very painful. Add explanation of sitrep::msf_dict_rename_helper() - also add to this function to say whether those variables are OPTIONAL or NECESSARY for the template to run. Consider also changing variables to alphabetic order. ~~Consider integrating the codebook packagecomment comment 2 comment 3 This should also address comment 4 and clarify what class variables need to be, e.g. sex as a factor (also add to notes?) Cross reference with the [variable names & data dictionary] point under training heading above. ~~ [moved to nice-to-haves issue post]Template structure
[x] [sitrep - labelling code] people get lost in lines of commented code. Label things more clearly. e.g. " INSTRUCTIONS" or " CODE OPTION". (Or think of a better way to do this) comment
[x] [sitrep - chunk layout] consider breaking up code in to smaller chunks.
Also consider not having code commented out (hard for overview) - instead have chunks called "*** ALTERNATIVE OPTION" (where can turn on and off eval = TRUE/FALSE). comment[x] [sitrep - gen_data chunk] consider putting the chunk for creating fake data before the code when read in your own data. Ideally put it in a seperate chunk. I personally wouldnt use knitr::params comment
[x] [sitrep - reporting week] move reporting-week chunk to the beginning of the script, so that people just need to update that each week then knit. (Also remove the example line which takes the maximum week of the dataset). comment
Installing / updating R, R studio, Sitrep & packages
[x] [updating sitrep] Add to FAQ that you may need to restart R when installing or updating (issue of removing prior installations and removing previously present general outbreak template). You may also need to install each of the packages one-by-one if it gets stuck. comment comment 2 comment 3
[x] [Sitrep - setup chunk] put commented code for installing and loading optional packages (e.g. excel.link, askpass, write.xl, here, etc...) comment comment 2 comment 3
Reading data
[x] [Sitrep] xl.read.file function for reading in password protected sheets does not allow reading in protected sheets - add a note on what to do. Also explain askpass::askpass better (it is not so intuitive) comment
[x] [Sitrep - filepaths] Add a comment explaining that need to change slashes from \ to / if copy pasting file paths. comment.
[x] [Sitrep - read_nonDHIS_data] need to update exaplanation of data dictionary columns to reflect change in data_dictionary outputs, e.g. change "Code" to "option_code" and "Name" to "option_name". (e.g. cholera l245) comment
[x] [Sitrep - read_csv]
readr::read_csv()
doesnt convert files to UTF8,rio::import()
does, which is important for cleaning functions used later. Consider deleting read_csv in favour of rio::import everywhere. comment[x] [Sitrep - rio::import] consider adding optional code for an excel file where the dataset doesnt start in excel cell A1 (or if you want to drop rows with sub headings) comment
[x] [Sitrep::gen_population] fix the example props used in templates using the excel file with standardised agegroupings / props that annick sent. Generally discuss with annick what age groupings should be used in templates. comments. Cross reference with gen_population point from functions heading.
[x] [sitrep readsf] double check whether or not need to specify .shp_ in file path or if sf::read_sf recognises automatically. comment
Data Cleaning
[x]
[Sitrep - epitrix::clean_colnames] Need to add explanations of how to protect columns from being removed, using e.g. `protect = "#"´comment.[x] [sitrep - var names] tripple check that examples of renaming vars are consistent with var names used throughout the script (e.g. sex and gender in non_dhis chunks). comment comment2
[x] [sitrep - select vars example]
linelist_cleaned <- select(linelist_cleaned, c(1:3, "age_years", "sex")
needs an extra bracket at the end (in all templates). Also add a comment that this can be used to create temporary datasets.[x] [sitrep - dates] Add origin to as.dates and explain that for excel this origin is 1899-12-30. Also mention different date formates i.e. %d-%b%y comment
[x] [sitrep - dates] when fixing unrealistic dates need to change
~NA
toas.Date(NA)
. Double check all templates. comment[x] [sitrep - factors] case_when creates character variables; comments say it creates factors. Fix the comments to say character variables. Double check that this is still necessary after switching to forcats package. comment
[x] [sitrep - multiple vars] add example coding of how to change multiple variables with the same case_when. e.g. below from comment
[x] [sitrep - filter data] show how to drop rows where everything is NA except the ID variable. comment
[x] [sitrep - save_cleaned_data] swap writexl::write_xlsx for rio::export (Alex: not sure why we are using writexl anyway?)
Data Analysis
[x] [cholera obs_days] days of observation need a
na.rm = TRUE
argument added. Maybe double check other templates? comment[x] [sitrep attack_rate] make sure that all multipliers are 10,000. E.g. in cholera the overall is by 10k and the age group by 100k. comment
[x] [sitrep epicurves] add an example of how to change x-axis labels - if have many years then label is far too dense. comment
[x]
[sitrep epicurves] add example of how to filter (drop) years (levels) using forcats::fct_drop() comment[x] [sitrep choropleth_maps] change colour palettes so that darker is for higher AR comment see nice post and interactive [palette browser] package (http://colorspace.r-forge.r-project.org/articles/hcl_palettes.html)
[x] [sitrep everywhere] update templates once all table functions have been fixed, eg. descriptive, multidescriptive,
tabulate_survey and tabulate_binary_survey- see post under functions here as well as this issue