cimentadaj / tidyflow

A simplified and fresh workflow for doing machine learning with tidymodels
https://cimentadaj.github.io/tidyflow/
Other
8 stars 0 forks source link

Correctly support quoted argument names for all plug_* functions #30

Open gutama opened 3 years ago

gutama commented 3 years ago

Hi,

the simple lm with initial_time_split works just fine

regularized_mod <- linear_reg(penalty = tune(), mixture = tune()) %>% set_engine("glmnet")
datats <- datats %>% 
  mutate(period = as.Date(period)) %>% 
  select(all_of(row_names))

tflow1 <- tflow %>%
  replace_formula(gdprl ~ .) %>% 
  replace_data(data = datats) %>% 
  plug_resample(sliding_period,index=period,periode="month",lookback=72,assess_stop=12) %>%   
  plug_grid(grid_regular, levels = 5) %>% 
  replace_model(regularized_mod)
tflow1

the output seems ok

== Tidyflow ========================================================== Data: 105 rows x 22 columns Split: initial_time_split w/ prop = ~0.9 Formula: gdprl ~ . Resample: sliding_period w/ index = ~period, periode = ~"month", lookback = ~72, assess_stop = ~12 Grid: grid_regular w/ levels = ~5 Model: Linear Regression Model Specification (regression)

Main Arguments: penalty = tune() mixture = tune()

Computational engine: glmnet

But if I try to fit I have strange error

fit_m <- fit(tflow1)

fit_m

Error in FUN(X[[i]], ...) : object 'period' not found

  1. FUN(X[[i]], ...)
  2. lapply(object[[2]], eval_tidy)
  3. fit.action_resample(action, x)
  4. fit(action, x)
  5. .fit_pre(x)
  6. fit.tidyflow(tflow1)
  7. fit(tflow1)

the column period is already in my datats.

am I doing something wrong? thank you

gutama commented 3 years ago

I change my cv method into rollingorigin, now it's working just fine, still don't know why sliding* not working

gdp_rec <-
  ~ .x %>%
    recipe(gdprl ~ .) %>%
    step_rm(contains("period"))
    step_center(all_predictors()) %>% 
    step_scale(all_predictors())

regularized_mod <- linear_reg(penalty = tune(), mixture = tune()) %>% set_engine("glmnet")

tflow2 <- tidyflow(seed = 5321) %>% 
  plug_data(datats) %>% 
  plug_split(initial_time_split, prop=0.9) %>% 
  plug_recipe(gdp_rec) %>% 
  plug_resample(rolling_origin, initial=72, assess=12, cumulative=FALSE, skip = 3) %>%   
  plug_grid(grid_latin_hypercube) %>% 
  plug_model(regularized_mod) 

tflow2

thank you

kind regards

cimentadaj commented 3 years ago

Hi! Thank you very opening this. This is a bug related to the quoting of arguments in plug_resample. If you wrap everything in character strings, it should work until I can get around to this:

library(tidymodels)
#> ── Attaching packages ───────────────────────────────── tidymodels 0.1.1.9000 ──
#> ✔ broom     0.7.1          ✔ recipes   0.1.13    
#> ✔ dials     0.0.9.9000     ✔ rsample   0.0.8.9000
#> ✔ dplyr     1.0.2          ✔ tibble    3.0.4.9000
#> ✔ ggplot2   3.3.2          ✔ tidyr     1.1.2     
#> ✔ infer     0.5.3          ✔ tune      0.1.1.9000
#> ✔ modeldata 0.0.2          ✔ workflows 0.2.0.9000
#> ✔ parsnip   0.1.4          ✔ yardstick 0.0.7     
#> ✔ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()
library(tidyflow)
#> 
#> Attaching package: 'tidyflow'
#> The following object is masked from 'package:tune':
#> 
#>     parameters
#> The following object is masked from 'package:dials':
#> 
#>     parameters

data(drinks, package = "modeldata")

drinks$y <- rnorm(nrow(drinks), 10, 20) * drinks$S4248SM144NCEN

tflow <-
  drinks %>%
  tidyflow(seed = 231512) %>%
  plug_formula(y ~ S4248SM144NCEN) %>%
# Notice how date is NOT wrapped as a string
  plug_resample(sliding_period,
                index = date,
                period = "year") %>%
  plug_model(linear_reg() %>% set_engine("lm"))

fit(tflow)
#> Error: unused argument (.x[[i]])

### When wrapping date as a string it works:

tflow %>%
  replace_resample(sliding_period, index = "date", period = "year") %>%
  fit()
#> ══ Tidyflow [tuned] ════════════════════════════════════════════════════════════
#> Data: 309 rows x 3 columns
#> Split: None
#> Formula: y ~ S4248SM144NCEN
#> Resample: sliding_period w/ index = ~"date", period = ~"year"
#> Grid: None
#> Model:
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm 
#> 
#> ══ Results ═════════════════════════════════════════════════════════════════════
#> 
#> Tuning results: 
#> 
#> # A tibble: 5 x 4
#>   splits          id      .metrics         .notes          
#>   <list>          <chr>   <list>           <list>          
#> 1 <split [12/12]> Slice01 <tibble [2 × 3]> <tibble [0 × 1]>
#> 2 <split [12/12]> Slice02 <tibble [2 × 3]> <tibble [0 × 1]>
#> 3 <split [12/12]> Slice03 <tibble [2 × 3]> <tibble [0 × 1]>
#> 4 <split [12/12]> Slice04 <tibble [2 × 3]> <tibble [0 × 1]>
#> 5 <split [12/12]> Slice05 <tibble [2 × 3]> <tibble [0 × 1]>
#> 
#> ... and 20 more lines.