Workflows - Error with add_formula() when factor column on RHS #3

Closed mdancho84 closed 3 years ago

mdancho84 commented 3 years ago

@DavisVaughan - I'm getting a weird issue with workflows with a new package we are testing out called gamsnip. I'm trying to add a formula and I'm running into an error with factors in the RHS.


Error: Functions involving factors or characters have been detected on the RHS of `formula`. These are not allowed when `indicators = "none"`. Functions involving factors were detected for the following columns: 'id'.

Reproducible Example

m4_monthly_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(.length_out = 24, .bind_data = TRUE) %>%
    mutate(lag_24 = lag(value, 24)) %>%
    ungroup() %>%
    mutate(date_num = as.numeric(date)) %>%
    mutate(date_month = month(date))
m4_monthly_train  <- m4_monthly_extended %>% drop_na()
m4_monthly_future <- m4_monthly_extended %>% filter(

splits <- time_series_split(m4_monthly_train, assess = 24, cumulative = TRUE)
wflw_fit_gam <- workflow() %>%
        gam_mod(mode = "regression") %>%
            set_engine("gam", method = "REML")
    ) %>%
        value ~ s(date_month, by = id) 
        + s(date_num, by = id) 
        + s(date_num, date_month, by = id) 
        + id
    ) %>%
#> Error: Functions involving factors or characters have been detected on the RHS of `formula`. These are not allowed when `indicators = "none"`. Functions involving factors were detected for the following columns: 'id'.

DavisVaughan commented 3 years ago

add_formula() is primarily used to specify terms / variables in the model (although it also does some light pre-processing as well using the standard model.matrix() infrastructure). Notably, it is not aware of any "special" functions like s() that are model specific.

What you want is to supply a model formula through add_model(formula = ). This is different from the variable selection / preprocessing formula that you supply in add_formula(). A model formula will be passed all the way through to the mgcv call (or whatever pkg is used), no tidymodels package will do anything with that model formula.

So I would do something like this, specifying variables that are going to be used in the model with add_variables() (or you could use add_recipe()), and then specifying exactly how the model should be fit with add_model(formula = ).


m4_monthly_extended <- m4_monthly %>%
  group_by(id) %>%
  future_frame(.length_out = 24, .bind_data = TRUE) %>%
  mutate(lag_24 = lag(value, 24)) %>%
  ungroup() %>%
  mutate(date_num = as.numeric(date)) %>%
  mutate(date_month = month(date))

m4_monthly_train  <- m4_monthly_extended %>% drop_na()
m4_monthly_future <- m4_monthly_extended %>% filter(

splits <- time_series_split(m4_monthly_train, assess = 24, cumulative = TRUE)

spec <- gam_mod(mode = "regression") %>%
  set_engine("gam", method = "REML")

wflw_fit_gam <- workflow() %>%
  add_variables(value, c(date_month, date_num, id)) %>%
    formula = value ~ s(date_month, by = id) + 
      s(date_num, by = id) + 
      s(date_num, date_month, by = id) + 
  ) %>%

mdancho84 commented 3 years ago

This is exactly what I needed. 👍