business-science / gammodels

The parsnip backend for GAM Models.
https://business-science.github.io/gammodels/
Other
7 stars 3 forks source link

Core `gen_additive_mod()` algorithm #2

Open mdancho84 opened 3 years ago

mdancho84 commented 3 years ago

Develop gen_additive_mod() algorithm with modes "regression" and "classification"

See business-science/modeltime#71 for Discussion and Basic Example

mdancho84 commented 3 years ago

The tidymodels team is working on a new R package multilevelmod, which covers linear mixed-effects models from lmer and other associated R packages. The formula structure is similar to mcgv::gam in that there are modifiers that get passed through a formula interface.

References

This is a good one to check out.

AlbertoAlmuinha commented 3 years ago

Hi @mdancho84

I am not sure if we need this because we just use common formulas + functions inside the formula, but our user can use the function equally like in the mgcv::gam() function. In principle I do not see it necessary. Unless we are going to use an engine that uses a formulation of the style that these packages use.

AlbertoAlmuinha commented 3 years ago

Hi @BenWynne-Morris ,

I need you to do some checks on the predict function for gams. I need you to compare this with what a linear regression would give to see if the gams can somehow give confidence intervals. As you can see, with the code below you will see that linear regression produces confidence intervals while gams do not. We need to know if there is any way to get them.

predict(model_fit_gam, newdata = training(splits), type = 'response', interval = "confidence")
predict(model_fit_lm, newdata = training(splits), type = 'response', interval = "confidence")

We also need to compare the following code and see if there is something similar for gams (or an explanation of why the results are so different between gam and lm):

predict(model_fit_gam, newdata = training(splits), type = 'response', interval = "prediction")
predict(model_fit_lm, newdata = training(splits), type = 'response', interval = "prediction")

Regards,

mdancho84 commented 3 years ago

Thinking through this:

mgcv

A gam model for regression looks like this:

model_fit_gam <- gam(
  formula = value ~ s(date_month, k = 12) + s(date_num) + s(lag_24) + s(date_num, date_month),
  family  = Gamma(link="log"),
  method  = "REML",
  data    = training(splits)
)

Parsnip

Phase 1 - Get this working.

model_fit_gam <- gam_mod(mode = "regression") %>%
    set_engine("gam", family=Gamma(link="log"), method = "REML") %>%
    fit(value ~ s(date_month, k = 12) + s(date_num) + s(lag_24) + s(date_num, date_month), data = training(splits))

Workflows

There are 2 interfaces: formula and recipes.

  1. (Phase 1) Formula - Is similar to fit.model_spec(formula)
  2. (Phase 2) Recipes - Is interesting because it enables tuning parameters, which is something that can help to improve model performance.

Idea - Potential Recipes Interface

Definitely a phase 2 item, but this might be useful. Would take some serious thinking about how we'd want to implement for gams.

library(tidymodels)
library(gamsnip)
library(modeltime)
library(tidyverse)
library(timetk)

m750_extended <- m750 %>%
    group_by(id) %>%
    future_frame(.length_out = 24, .bind_data = TRUE) %>%
    mutate(lag_24 = lag(value, 24)) %>%
    ungroup()

m750_train <- m750_extended %>% drop_na()

recipe_spec <- recipe(value ~ date + lag_24, m750_train) %>%
    step_mutate(date_num = as.numeric(date)) %>%
    step_mutate(date_mon = lubridate::month(date))%>%
    step_rm(date) %>%
    step_interact(terms = ~ date_num * date_mon)

recipe_spec %>% prep() %>% juice()
# A tibble: 282 x 5
#> lag_24 value date_num date_mon date_num_x_date_mon
#> <dbl> <dbl>    <dbl>    <dbl>               <dbl>
#> 1   6370  7030     8035        1                8035
#> 2   6430  7170     8066        2               16132
#> 3   6520  7150     8095        3               24285
#> 4   6580  7180     8126        4               32504
#> 5   6620  7140     8156        5               40780
#> 6   6690  7100     8187        6               49122
#> 7   6000  6490     8217        7               57519
#> 8   5450  6060     8248        8               65984
#> 9   6480  6870     8279        9               74511
#> 10   6820  6880     8309       10               83090
#> # … with 272 more rows

# Possible new step_gam_* functions
recipe_spec_gam <- recipe_spec %>%
    step_gam_smooth(date_mon, k = 12) %>%
    step_gam_smooth(lag_24, date_num, date_num_x_date_mon, method = "REML")