Open Nic-Chr opened 3 years ago
A lot of really great work here and a lot that can be used in phsmethods
, thanks for getting in touch @Nic-Chr
Some general comments before I take each function in turn:
add_thousands_sep
format_perc
grand_total_row
missing_dates
nlr
npv
nth_elements
plr
ppv
prevalence_threshold
replace_nan
👵 add_worksheet_with_table
- not sure how often something like this would be required across PHS. Maybe the total row could be made optional or other functionality added around creating tables in Excel?detach_all_packages
- if using R (projects) properly, is there a need for using this?df_count
- this could be useful, just think the name could be more meaningful?extract_files
- I can see this being really useful but just want to review to ensure it can be generalised across PHS.gcd
- could be useful to have some limits/error handling around this? As it's recursive, memory issues could occur as the size of the vectors increase.lcm
- uses gcd
so same comment as abovestr_extract_ca
- any scope to be more robust, e.g. casting to lower-case for matches?str_extract_country
- any scope to be more robust, e.g. casting to lower-case for matches?str_extract_date
- any scope to be more robust, e.g. dates with non-numeric elements? str_extract_hb
- any scope to be more robust, e.g. casting to lower-case for matches?sub_lab_code_to_name
- I'm being quite petty here, but could the function name be clearer?test_characteristics
- very specific output, would need to review usability across PHS.ts_df
- again, more around the usability across PHS.These are all related to the match_area
function and, while the functionality for these functions doesn't exist, some work to integrate may be required (e.g. changing the name of the functions):
ca_name_to_code
hb_name_to_code
pc_to_ca
- uses lookup on Freddy, could this be stored on the package?pc_to_hb
- uses lookup on Freddy, could this be stored on the package?pc_to_intzone
- uses lookup on Freddy, could this be stored on the package?age_to_band
- functionality already exists in age_group
functionca_code_to_name
- functionality already exists in match_area
functionhb_code_to_name
- functionality already exists in match_area
functionnth_elements
or even nlr
. However, this is a way we can standardise the approach to these problems and where they are common in our R programs, improve readability. phsmethods
training app will need to be updated.I'll let @davidc92 come in on this now and we can make a plan of action! 🥳
Before I wade in, Nick, can you please open a separate issue for each function with a short description of what it is intended to do? It is not immediately clear from a lot of the names :) With separate issues we can manage their implementation properly. Let me know once they have all been opened and I will review them.
D
Hi Russell and David, thanks for the feedback, I've begun opening issues for each function separately.
I agree with you, the functionality does already exist for both ca_code_to_name
and hb_code_to_name
so may not be needed for the package but I do think there are a few additional features of age_to_band
that build upon phsmethods::age_group
.
One of the main added functionalities is that you can specify irregularly spaced age breaks. Another is the ability to pass arguments to factor
, controlling things like factor labels, order and excluded levels (see the below example).
mylabels <- c("0 to 10", "11 to 19", "20 to 39", "40 to 59", "60 to 79", "80 and older")
age_to_band(1:100, age_breaks = c(0, 11, 20, 40, 60, 80), stringsAsFactors = TRUE, labels = mylabels, ordered = TRUE)
Maybe not a critical improvement but happy to hear your thoughts!
I think non-even age breaks would be a great addition. This is something I and others on my team do regularly. I think the challenge will be to implement it without changing the existing functionality and keeping the API simple... lots of tests needed!
Very happy for age_group
to be amended to allow irregular bandings :)
I created an issue for allowing custom bin lengths for the create_age_groups
function here. As the discussion didn't continue on this thread, it would be great if we could continue it there – https://github.com/Public-Health-Scotland/phsmethods/issues/93#issue-1673433892.
Thanks @davidc92 and agree with @r6lm in that updating age_group
to accept custom breaks would be very useful.
I have a method here age_band.R which I have tried to make user friendly by creating a list of common age groupings and using one of these as the default breaks argument.
For example, the default breaks are set to age_breaks$by_20_to_80
which corresponds to the breakpoint vector: 0 20 40 60 80
The age breaks vector could look something like this and would be exported and visible to the user:
#' @export
age_breaks <- list("by_5_to_80" = seq.int(from = 0L, to = 80L, by = 5L),
"by_10_to_80" = seq.int(from = 0L, to = 80L, by = 10L),
"by_20_to_80" = seq.int(from = 0L, to = 80L, by = 20L),
"by_5_to_90" = seq.int(from = 0L, to = 90L, by = 5L),
"by_10_to_90" = seq.int(from = 0L, to = 90L, by = 10L),
"by_20_to_90" = seq.int(from = 0L, to = 90L, by = 20L))
> age_breaks
$by_5_to_80
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
$by_10_to_80
[1] 0 10 20 30 40 50 60 70 80
$by_20_to_80
[1] 0 20 40 60 80
$by_5_to_90
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
$by_10_to_90
[1] 0 10 20 30 40 50 60 70 80 90
$by_20_to_90
[1] 0 20 40 60 80
Is your feature request related to a problem? Please describe. No problems!
Describe the solution you'd like I have quite a few helper functions that I think might be useful to phsmethods, happy to discuss in more detail.