GerkeLab / fcds

Process data from the Florida Cancer Data System
https://gerkelab.github.io/fcds/
Other
3 stars 1 forks source link

Incompatible types under 'age_group'? #91

Closed vickyliao92 closed 4 years ago

vickyliao92 commented 4 years ago

I tried to calculate age-adjusted rates for a select cancer site (same code I've been using), but received a new error message:

Error: Can't combine `age_group` <factor<e7a20>> and `age_group` <ordered<64369>>.
Run `rlang::last_error()` to see where the error occurred.

image

I ran the script line by line with no issue until the age_adjust() line.

Code I ran:

select_site <- c("Lung and Bronchus")
behavior_group <- c("Invasive")
year_groups <- c("2012-2016")

selectcancer <- fcds %>% 
  filter(
    year_group %in% year_groups, 
    county_name %in% moffitt_county, 
    cancer_site_group %in% select_site, 
    cancer_ICDO3_behavior %in% behavior_group
  ) %>%
  count_fcds(county_name) %>%
  complete_age_groups(county_name, tidyr::nesting(year_group, year)) %>%
  filter_age_groups(age_gt = 20) %>%
  group_drop(county_name) %>%
  age_adjust() 
gadenbuie commented 4 years ago

I see that this is a problem but it might take me a bit to get to it to be able to fix it.

In the mean time, can you try adding standardize_age_groups() to the pipeline?

selectcancer <- fcds %>% 
  filter(
    year_group %in% year_groups, 
    county_name %in% moffitt_county, 
    cancer_site_group %in% select_site, 
    cancer_ICDO3_behavior %in% behavior_group
  ) %>%
  count_fcds(county_name) %>%
  complete_age_groups(county_name, tidyr::nesting(year_group, year)) %>%
  filter_age_groups(age_gt = 20) %>%
  standardize_age_groups() %>%  #<< added this line here ------------------- <<!!
  group_drop(county_name) %>%
  age_adjust() 
vickyliao92 commented 4 years ago

I still receive a similar but slightly different error message.

Error: Can't combine `age_group` <factor<e7a20>> and `age_group` <ordered<e7a20>>.
Run `rlang::last_error()` to see where the error occurred.
vickyliao92 commented 4 years ago

Not sure if this helps, but I got another new error message, which seems related to the rows with age_group = "Unknown". image

I then took a subset of the dataset with age_group != "Unknown" and was able to run the code on the subset with no error messages.

gadenbuie commented 4 years ago

Sorry @vickyliao92, what's happening here is that one of the packages that I use in a lot of the functions was updated and is now much more strict about factor levels. I'll try to sort this out today, likely this afternoon.

gadenbuie commented 4 years ago

@vickyliao92 I think I've fixed all the errors caused by incompatibility with the latest version of dplyr. Can you upgrade fcds and let me know if it works for you?

remotes::install_gitlab("gerkelab/fcds")
vickyliao92 commented 4 years ago

@gadenbuie I tested using "Urinary Bladder" as cancer_site_group, "Invasive" for cancer_ICDO3_behavior, and filtered age > 20. Everything ran without any error. However, when I used "Insitu" for cancer_ICDO3_behavior, I got this error message.

Error: Problem with `mutate()` input `age_max`.
x `false` must be a double vector, not a logical vector.
i Input `age_max` is `if_else(is.na(age_max) & !is.na(age_min), Inf, age_max)`.
i The error occurred in group 19: county_name = "Alachua", year_group = "2012-2016", year = "2014", age_group = "Unknown".
gadenbuie commented 4 years ago

The problem here is the "Unknown" in the age_group column, which, in essence, means we don't know how or where to allocate that person in the age adjustment. You might need to apply a filter first to remove these rows:

fcds %>% 
  filter(
    age_group != "Unknown",
    year_group %in% year_groups, 
    county_name %in% moffitt_county, 
    cancer_site_group %in% select_site, 
    cancer_ICDO3_behavior %in% behavior_group
  ) %>% 
  # ...

Are you working with the new FCDS data now?

vickyliao92 commented 4 years ago

@gadenbuie Thanks, Garrick. That worked and I will filter out rows with age_group == "Unknown". I am currently still working with the 2019 (old) dataset.