Public-Health-Scotland / phsmethods

An R package to standardise methods used in Public Health Scotland (https://public-health-scotland.github.io/phsmethods/)
https://public-health-scotland.github.io/phsmethods/
54 stars 13 forks source link

Adding new functions #54

Open Nic-Chr opened 3 years ago

Nic-Chr commented 3 years ago

Is your feature request related to a problem? Please describe. No problems!

Describe the solution you'd like I have quite a few helper functions that I think might be useful to phsmethods, happy to discuss in more detail.

rmccreath commented 3 years ago

A lot of really great work here and a lot that can be used in phsmethods, thanks for getting in touch @Nic-Chr Some general comments before I take each function in turn:

No real comment required - good to go:

Query or small change required - hold:

These are all related to the match_area function and, while the functionality for these functions doesn't exist, some work to integrate may be required (e.g. changing the name of the functions):

Not needed/right for this package - stop:

Wider impact on phsmethods and future thinking

I'll let @davidc92 come in on this now and we can make a plan of action! 🥳

davidc92 commented 3 years ago

Before I wade in, Nick, can you please open a separate issue for each function with a short description of what it is intended to do? It is not immediately clear from a lot of the names :) With separate issues we can manage their implementation properly. Let me know once they have all been opened and I will review them.

D

Nic-Chr commented 3 years ago

Hi Russell and David, thanks for the feedback, I've begun opening issues for each function separately. I agree with you, the functionality does already exist for both ca_code_to_name and hb_code_to_name so may not be needed for the package but I do think there are a few additional features of age_to_band that build upon phsmethods::age_group. One of the main added functionalities is that you can specify irregularly spaced age breaks. Another is the ability to pass arguments to factor, controlling things like factor labels, order and excluded levels (see the below example).

mylabels <- c("0 to 10", "11 to 19", "20 to 39", "40 to 59", "60 to 79", "80 and older")
age_to_band(1:100, age_breaks = c(0, 11, 20, 40, 60, 80), stringsAsFactors = TRUE, labels = mylabels, ordered = TRUE)

Maybe not a critical improvement but happy to hear your thoughts!

Moohan commented 3 years ago

I think non-even age breaks would be a great addition. This is something I and others on my team do regularly. I think the challenge will be to implement it without changing the existing functionality and keeping the API simple... lots of tests needed!

davidc92 commented 3 years ago

Very happy for age_group to be amended to allow irregular bandings :)

r6lm commented 1 year ago

I created an issue for allowing custom bin lengths for the create_age_groups function here. As the discussion didn't continue on this thread, it would be great if we could continue it there – https://github.com/Public-Health-Scotland/phsmethods/issues/93#issue-1673433892.

Nic-Chr commented 3 months ago

Thanks @davidc92 and agree with @r6lm in that updating age_group to accept custom breaks would be very useful.

I have a method here age_band.R which I have tried to make user friendly by creating a list of common age groupings and using one of these as the default breaks argument.

For example, the default breaks are set to age_breaks$by_20_to_80 which corresponds to the breakpoint vector: 0 20 40 60 80 The age breaks vector could look something like this and would be exported and visible to the user:

#' @export
age_breaks <- list("by_5_to_80" = seq.int(from = 0L, to = 80L, by = 5L),
                   "by_10_to_80" = seq.int(from = 0L, to = 80L, by = 10L),
                   "by_20_to_80" = seq.int(from = 0L, to = 80L, by = 20L),
                   "by_5_to_90" = seq.int(from = 0L, to = 90L, by = 5L),
                   "by_10_to_90" = seq.int(from = 0L, to = 90L, by = 10L),
                   "by_20_to_90" = seq.int(from = 0L, to = 90L, by = 20L))
> age_breaks
$by_5_to_80
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

$by_10_to_80
[1]  0 10 20 30 40 50 60 70 80

$by_20_to_80
[1]  0 20 40 60 80

$by_5_to_90
 [1]  0  5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

$by_10_to_90
 [1]  0 10 20 30 40 50 60 70 80 90

$by_20_to_90
[1]  0 20 40 60 80