GerkeLab / fcds

Process data from the Florida Cancer Data System
https://gerkelab.github.io/fcds/
Other
3 stars 1 forks source link

Proposal: add n_years argument to age_adjust() #60

Open gadenbuie opened 5 years ago

gadenbuie commented 5 years ago

Proposal: Scale rate to yearly average and default to 5 because of default number of years in FCDS year groups.

Eliminates need for final mutate step:

fcds <- fcds_load()

fcds_moffitt_prostate <- 
  fcds %>% 
  filter(cancer_site_group == "Prostate Gland", year > 1985) %>% 
  filter_age_groups(age_gt = 20) %>% 
  count_fcds(sex = "Male", moffitt_catchment = TRUE)

fcds_moffitt_prostate %>% 
  complete_age_groups(age_gt = 20) %>% 
  age_adjust() %>% 
  mutate(n = n / 5, rate = rate / 5)
# A tibble: 90 x 7
# Groups:   sex, county_name, year_group, year [90]
   sex   county_name year_group year      n population  rate
   <chr> <fct>       <fct>      <chr> <dbl>      <dbl> <dbl>
 1 Male  Charlotte   1986-1990  1988  107        38144 137. 
 2 Male  Charlotte   1991-1995  1993  161.       48412 161. 
 3 Male  Charlotte   1996-2000  1998  212.       53639 187. 
 4 Male  Charlotte   2001-2005  2003  214.       60336 174. 
 5 Male  Charlotte   2006-2010  2008  226.       64059 176. 
 6 Male  Charlotte   2011-2015  2013  137.       67701  99.5

instead age_adjust() would return

fcds_moffitt_prostate %>% 
  complete_age_groups(age_gt = 20) %>% 
  age_adjust()
# A tibble: 90 x 8
# Groups:   sex, county_name, year_group, year [90]
   sex   county_name year_group year      n population  rate rate_year
   <chr> <fct>       <fct>      <chr> <dbl>      <dbl> <dbl>     <dbl>
 1 Male  Charlotte   1986-1990  1988    535      38144  687.     137. 
 2 Male  Charlotte   1991-1995  1993    803      48412  804.     161. 
 3 Male  Charlotte   1996-2000  1998   1062      53639  937.     187. 
 4 Male  Charlotte   2001-2005  2003   1072      60336  872.     174. 
 5 Male  Charlotte   2006-2010  2008   1132      64059  881.     176. 
 6 Male  Charlotte   2011-2015  2013    686      67701  497.      99.5

We could either return rate_year as rate or include both rate (the raw calculated rate) and rate_year.

@tgerke The question I have is what to do with n? If age_adjust() reports rates in different units than n, should we also update n as well? This is why I went with a unit-agnostic function.

I'm thinking of maybe doing a little more work to age_adjust() so that it's completely unit agnostic (e.g. changing names of arguments to be fully generalized) and then also providing an age_adjust_fcds() function that makes strong assumptions about all of the above and returns both rate and n as mean per year.