AlbertoAlmuinha / bayesmodels

The Tidymodels Extension for Bayesian Models
https://albertoalmuinha.github.io/bayesmodels/
Other
54 stars 9 forks source link

Predictions: Only one predicted value being returned when should be length(testing(split)) #9

Closed mdancho84 closed 2 years ago

mdancho84 commented 2 years ago

I'm pretty sure this is related to Issue #8 - Something weird happens when using the formula interface. This is another reason I prefer the data.frame interface when developing parsnip functions.

Problem

Only 1 predicted value is being returned. Should be 52 predictions. - Again, I think the solution can be resolved by switching to data.frame interface as discussed in #8.

predict(modelo, testing(splits)) # will return only 1 predicted value
# A tibble: 1 x 1
  .pred
  <dbl>
1 0.220

Reproducible Example

library(bayesmodels)

library(tidymodels)

library(timetk)

library(modeltime)

library(modeltime.resample)

library(modeltime.ensemble)

data(iclaims)

names(initial.claims)

df <- timetk::tk_tbl(initial.claims)

df %>% plot_time_series(.date_var = index,

                        .value = iclaimsNSA,

                        .smooth = FALSE)

# Issue 1

# split

splits <- time_series_split(

    data = df,

    # date_var = 'date',

    assess     = 52,

    cumulative = TRUE

)

# splits %>% tk_time_series_cv_plan() %>% plot_time_series_cv_plan(index, iclaimsNSA)

ss <- AddLocalLinearTrend(list(), training(splits)$iclaimsNSA)

ss <- AddSeasonal(ss, training(splits)$iclaimsNSA, nseasons = 52)

modelo <- bayesian_structural_reg() %>%

    set_engine("stan", state.specification = ss, niter = 1000) %>%

    fit(iclaimsNSA ~ index, data = training(splits))

modeltime_tbl <- modeltime_table(modelo)

calib_tbl <- modeltime_table(modelo) %>% modeltime_calibrate(testing(splits))

a <- calib_tbl %>%

    modeltime_forecast(

        new_data = testing(splits),

        actual_data = training(splits),

    )

# make the values NA in the test split

testing_tmp <- testing(splits)[c('index', 'iclaimsNSA')]

testing_tmp$iclaimsNSA <- NA

calib_tbl2 <- modeltime_tbl %>%

    modeltime_calibrate(new_data = testing_tmp,

                        actural_data = training(splits))

a2 <- calib_tbl %>%

    modeltime_forecast(

        new_data = testing_tmp,

        actual_data = training(splits),

    )

# compare Tables a and a2: Table a has only one prediction value for all timestaps in the test split but a2 has different predicted values in testing_tmp

predict(modelo, testing(splits)) # will return only 1 predicted value
AlbertoAlmuinha commented 2 years ago

This should be fixed now @mdancho84 . I have not switched to a data.frame interface because I need to use the formula to be able to build in the arguments to be passed in the function call precisely the formula. I don't see how to do this through the data.frame interface (or at least not a simple way).