business-science / modeltime.ensemble

Time Series Ensemble Forecasting
https://business-science.github.io/modeltime.ensemble/
Other
73 stars 16 forks source link

Only a fraction of models show up in modeltime_resamples... #18

Closed Steviey closed 2 years ago

Steviey commented 2 years ago

Ubuntu 16.x LTS, R latest, modeltime.ensemble latest

A submodels_tbl has 15 correctly fitted models. When I try to use them with modeltime_fit_resamples(), only a fraction of them show up in the result of that function (only 4) Is there an explanation available?

resamples_tscv <- df_train %>%
    time_series_cv(
        assess   = test_len
        ,initial = train_len
        #skip    = "2 years",
        ,slice_limit = dplyr::n()
    ) 

submodel_predictions <- submodels_tbl %>%
    modeltime_fit_resamples(
    resamples = resamples_tscv,
    control   = control_resamples(verbose = TRUE)
)

debugAnalyse<-1
if(debugAnalyse>0){
    # Visualize the Resample Sets
    myPlot<-resamples_tscv %>%
        tk_time_series_cv_plan() %>%
        plot_time_series_cv_plan(
            date, value,
            .facet_ncol  = 2,
            .interactive = TRUE
        )

    print(myPlot)    

    myPlot<-submodel_predictions %>%
        plot_modeltime_resamples(
            .interactive = TRUE
        )

    print(myPlot)

    #View(submodel_predictions)

    predictions_tbl <- modeltime.resample::unnest_modeltime_resamples(submodel_predictions)

    View(predictions_tbl)

    predictions_tbl$editDate        <- format(Sys.time(), "%Y-%m-%d %H:%M:%S")
    predictions_tbl$pslMetaLearner  <- metaLearner
    fastInOut('predictions_tbl.Rds',predictions_tbl)

    predictions_by_rowid_tbl <- predictions_tbl %>%
        dplyr::select(.row_id, .model_id, .pred) %>%
        dplyr::mutate(.model_id = stringr::str_c(".model_id_", .model_id)) %>%
        tidyr::pivot_wider(names_from  = .model_id,values_from = .pred)

    View(predictions_by_rowid_tbl)                        

}    

Side note: When I use less then the 15 models, the code breaks while fitting a glmnet-metaLearner. It promps:

 x Slice1: preprocessor 1/1, model 1/1: Error: For the glmnet engine, `penalty` 
 must be a single number (or a value of `tune()`).

... where the model ist correctly tagged with 'penalty=tune::tune()' I noticed the same effect with lasso (mixture=1).

My guess is, it will be forgotten anywhere in modeltime.ensemble-internal code. Currently I'm testing different metaLearners. Xgboost metaLearner seem to work only without xgboost submodels. Others work fine so far.

It would be nice to have a fallback/try-catch option in modeltime.resample. Otherwise code breaks in huge projects, any time something fails at this point.

Steviey commented 2 years ago

Could it be related to different lengths of .resample_results per model/workflow?

I noticed doing this:

predictions_tbl <- modeltime.resample::unnest_modeltime_resamples(m750_training_resamples_fitted)
View(predictions_tbl)

... results in a clickable, ready to drill down view in Rstudio.

... comparing with this, the result seems to be corrupted and is not clickable. The difference lies in the different lengths of .resample_results.

image

What would be, if we were able to reduce the predictions on min. length of resample.results?

update: There is more... .predictions=NULL in a GLMNET-model

image Every failing model will be lost... Since we have no influence on how models will be treated in modeltime_fit_resamples(), there is no chance to fix it- other then hardcoded.

Error in if (is.numeric(args$mixture) && (args$mixture < 0 | args$mixture > : missing value where TRUE/FALSE needed