Closed mdancho84 closed 3 years ago
add_formula()
is primarily used to specify terms / variables in the model (although it also does some light pre-processing as well using the standard model.matrix()
infrastructure). Notably, it is not aware of any "special" functions like s()
that are model specific.
What you want is to supply a model formula through add_model(formula = )
. This is different from the variable selection / preprocessing formula that you supply in add_formula()
. A model formula will be passed all the way through to the mgcv call (or whatever pkg is used), no tidymodels package will do anything with that model formula.
So I would do something like this, specifying variables that are going to be used in the model with add_variables()
(or you could use add_recipe()
), and then specifying exactly how the model should be fit with add_model(formula = )
.
library(modeltime)
library(tidymodels)
library(gamsnip)
library(tidyverse)
library(timetk)
library(lubridate)
m4_monthly_extended <- m4_monthly %>%
group_by(id) %>%
future_frame(.length_out = 24, .bind_data = TRUE) %>%
mutate(lag_24 = lag(value, 24)) %>%
ungroup() %>%
mutate(date_num = as.numeric(date)) %>%
mutate(date_month = month(date))
m4_monthly_train <- m4_monthly_extended %>% drop_na()
m4_monthly_future <- m4_monthly_extended %>% filter(is.na(value))
splits <- time_series_split(m4_monthly_train, assess = 24, cumulative = TRUE)
spec <- gam_mod(mode = "regression") %>%
set_engine("gam", method = "REML")
wflw_fit_gam <- workflow() %>%
add_variables(value, c(date_month, date_num, id)) %>%
add_model(
spec,
formula = value ~ s(date_month, by = id) +
s(date_num, by = id) +
s(date_num, date_month, by = id) +
id
) %>%
fit(training(splits))
wflw_fit_gam
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Variables
#> Model: gam_mod()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Outcomes: value
#> Predictors: c(date_month, date_num, id)
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> value ~ s(date_month, by = id) + s(date_num, by = id) + s(date_num,
#> date_month, by = id) + id
#>
#> Estimated degrees of freedom:
#> 8.59 6.35 7.71 4.75 1.39 1.01 1.00
#> 1.00 24.05 18.59 9.93 16.91 total = 105.27
#>
#> REML score: 10465.4
This is exactly what I needed. 👍
@DavisVaughan - I'm getting a weird issue with
workflows
with a new package we are testing out calledgamsnip
. I'm trying to add a formula and I'm running into an error with factors in the RHS.Problem
Reproducible Example
Created on 2021-03-26 by the reprex package (v1.0.0)