GerkeLab / fcds

Process data from the Florida Cancer Data System
https://gerkelab.github.io/fcds/
Other
3 stars 1 forks source link

Add count_filter_fcds() that does counting and filtering together #84

Open gadenbuie opened 4 years ago

gadenbuie commented 4 years ago

It's cumbersome to remember (and repeat) the grouping conditions that are required in order to complete zero-count groups when using count_fcds(). The complete_age_groups() function is correctly named but may be overly specific, especially in relation to the task at hand.

fcds %>% 
  filter(cancer_site_group == "Cervix Uteri") %>% 
  filter(sex == "Female") %>% 
  filter_age_groups(age_gt = 20) %>% 
  filter(county_name %in% fcds_const("moffitt_catchment")) %>% 
  count_fcds(race = TRUE, county_name, cancer_site_group) %>% 
  complete_age_groups(
    # required to know which age groups need to be completed
    age_gt = 20, 
    # Need to know the structure of the columns that need to be completed
    sex, race, county_name, cancer_site_group, 
    # Here's the tricky part: year_group and year vary together
    nesting(year_group, year)
  )

This is the very flexible workflow that ensures that any request can be completed. But it's also fairly common and can be abstracted into a single, one-shot function filter_count_fcds().

fcds %>% 
  filter_count_fcds(
    # Filters ....
    cancer_site_group == "Cervix Uteri",
    sex == "Female",
    county_name %in% fcds_const("moffitt_catchment"),
    # Arguments ....
    age_gt = 20,
    groups = c(race)
  )
  1. Arguments included in the filters are automatically included in groups.
  2. The groups argument allows counts broken down by additional columns across all values in that column.
  3. c("year_group", "year", "age_group") are still the default groups
  4. The default column structure when using just the FCDS data can be inferred from columns present in filters and groups. If unknown columns found then the function can bail early and recommend the manual workflow.
gadenbuie commented 4 years ago

The filters need to be in ... so that I can capture them with rlang::enexprs() but we can use a helper function for age bounds that mixes filter_age_groups() and recode_age_groups().