business-science / modeltime.h2o

Forecasting with H2O AutoML. Use the H2O Automatic Machine Learning algorithm as a backend for Modeltime Time Series Forecasting.
https://business-science.github.io/modeltime.h2o/
Other
41 stars 11 forks source link

`java.lang.NullPointerException` with predict on a model_fit object #25

Open spsanderson opened 1 year ago

spsanderson commented 1 year ago

__Data:___ testing_data.csv training_data.csv

Script:

model_spec <- automl_reg(mode = 'regression') %>%
  set_engine(
    engine                     = 'h2o',
    max_runtime_secs           = 60 * 60, 
    max_runtime_secs_per_model = 60 * 30,
    max_models                 = 15,
    nfolds                     = 5,
    #exclude_algos              = c("DeepLearning"),
    verbosity                  = NULL,
    seed                       = 786
  ) 

model_spec

model_fitted <- model_spec %>%
  fit(posting_amount_positive ~ ., data = training_data)

automl_leaderboard(model_fitted)

automl_update_model(
  model_fitted, 
  model_id = automl_leaderboard(model_fitted) %>%
    slice(1) %>%
    pull(model_id)
)

predict(model_fitted, testing_data)

This produces this failure:

> model_spec <- automl_reg(mode = 'regression') %>%
+   set_engine(
+     engine                     = 'h2o',
+     max_runtime_secs           = 60 * 60, 
+     max_runtime_secs_per_model = 60 * 30,
+     max_models                 = 15,
+     nfolds                     = 5,
+     #exclude_algos              = c("DeepLearning"),
+     verbosity                  = NULL,
+     seed                       = 786
+   ) 
> model_spec
H2O AutoML Model Specification (regression)

Engine-Specific Arguments:
  max_runtime_secs = 60 * 60
  max_runtime_secs_per_model = 60 * 30
  max_models = 15
  nfolds = 5
  verbosity = NULL
  seed = 786

Computational engine: h2o 

> model_fitted <- model_spec %>%
+   fit(posting_amount_positive ~ ., data = training(daily_splits))
Converting to H2OFrame...
  |=======================================================================================| 100%

Training H2O AutoML...
  |=======================================================================================| 100%
  |=======================================================================================| 100%

Leaderboard: 
                                                 model_id      rmse       mse       mae rmsle
1 StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 0.9665065 0.9341348 0.3746044   NaN
2    StackedEnsemble_AllModels_1_AutoML_7_20221211_140030 0.9667923 0.9346873 0.3753276   NaN
3    DeepLearning_grid_3_AutoML_7_20221211_140030_model_1 0.9685012 0.9379946 0.3743257   NaN
4    DeepLearning_grid_2_AutoML_7_20221211_140030_model_1 0.9734572 0.9476189 0.3956395   NaN
5             GBM_grid_1_AutoML_7_20221211_140030_model_2 0.9818772 0.9640828 0.3952624   NaN
6                          GBM_1_AutoML_7_20221211_140030 0.9822901 0.9648938 0.3997966   NaN
  mean_residual_deviance
1              0.9341348
2              0.9346873
3              0.9379946
4              0.9476189
5              0.9640828
6              0.9648938

[17 rows x 6 columns] 

Using top model: StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030
> automl_leaderboard(model_fitted)
# A tibble: 17 x 6
   model_id                                                 rmse   mse   mae rmsle mean_residua~1
   <chr>                                                   <dbl> <dbl> <dbl> <lgl>          <dbl>
 1 StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 0.967 0.934 0.375 NA             0.934
 2 StackedEnsemble_AllModels_1_AutoML_7_20221211_140030    0.967 0.935 0.375 NA             0.935
 3 DeepLearning_grid_3_AutoML_7_20221211_140030_model_1    0.969 0.938 0.374 NA             0.938
 4 DeepLearning_grid_2_AutoML_7_20221211_140030_model_1    0.973 0.948 0.396 NA             0.948
 5 GBM_grid_1_AutoML_7_20221211_140030_model_2             0.982 0.964 0.395 NA             0.964
 6 GBM_1_AutoML_7_20221211_140030                          0.982 0.965 0.400 NA             0.965
 7 GLM_1_AutoML_7_20221211_140030                          0.982 0.965 0.394 NA             0.965
 8 DeepLearning_1_AutoML_7_20221211_140030                 0.983 0.966 0.424 NA             0.966
 9 GBM_grid_1_AutoML_7_20221211_140030_model_1             0.994 0.988 0.411 NA             0.988
10 DeepLearning_grid_1_AutoML_7_20221211_140030_model_1    0.994 0.988 0.404 NA             0.988
11 GBM_grid_1_AutoML_7_20221211_140030_model_3             0.995 0.989 0.413 NA             0.989
12 GBM_2_AutoML_7_20221211_140030                          1.00  1.00  0.431 NA             1.00 
13 GBM_4_AutoML_7_20221211_140030                          1.00  1.01  0.431 NA             1.01 
14 GBM_3_AutoML_7_20221211_140030                          1.01  1.02  0.430 NA             1.02 
15 XRT_1_AutoML_7_20221211_140030                          1.03  1.07  0.457 NA             1.07 
16 GBM_5_AutoML_7_20221211_140030                          1.04  1.07  0.436 NA             1.07 
17 DRF_1_AutoML_7_20221211_140030                          1.04  1.08  0.463 NA             1.08 
# ... with abbreviated variable name 1: mean_residual_deviance
> automl_update_model(
+   model_fitted, 
+   model_id = automl_leaderboard(model_fitted) %>%
+     slice(1) %>%
+     pull(model_id)
+ )
parsnip model object

H2O AutoML - Stackedensemble
--------
Model: Model Details:
==============

H2ORegressionModel: stackedensemble
Model ID:  StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 
Number of Base Models: 5

Base Models (count by algorithm type):

deeplearning          drf          gbm          glm 
           1            2            1            1 

Metalearner:

Metalearner algorithm: glm
Metalearner cross-validation fold assignment:
  Fold assignment scheme: AUTO
  Number of folds: 5
  Fold column: NULL
Metalearner hyperparameters: 

H2ORegressionMetrics: stackedensemble
** Reported on training data. **

MSE:  0.933981
RMSE:  0.9664269
MAE:  0.3746983
RMSLE:  NaN
Mean Residual Deviance :  0.933981

H2ORegressionMetrics: stackedensemble
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  0.9341348
RMSE:  0.9665065
MAE:  0.3746044
RMSLE:  NaN
Mean Residual Deviance :  0.9341348

Cross-Validation Metrics Summary: 
                             mean        sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid
mae                      0.374247  0.026903   0.392305   0.405298   0.350108   0.380611
mean_residual_deviance   0.932917  0.179598   1.096220   1.072399   0.729629   1.018487
mse                      0.932917  0.179598   1.096220   1.072399   0.729629   1.018487
null_deviance          394.018070 80.098915 460.412320 469.711000 316.659150 422.672200
r2                      -0.000552  0.000855  -0.000413  -0.000020  -0.000022  -0.000252
residual_deviance      394.018070 80.098915 460.412320 469.711000 316.659150 422.672200
rmse                     0.962148  0.094791   1.047005   1.035567   0.854183   1.009201
rmsle                          NA  0.000000         NA         NA         NA         NA
                       cv_5_valid
mae                      0.342915
mean_residual_deviance   0.747850
mse                      0.747850
null_deviance          300.635680
r2                      -0.002052
residual_deviance      300.635680
rmse                     0.864783
rmsle                          NA
> predict(model_fitted, testing(daily_splits))
Converting to H2OFrame...
  |=======================================================================================| 100%
  |                                                                                       |   0%

java.lang.NullPointerException

java.lang.NullPointerException
    at water.MRTask.dfork(MRTask.java:623)
    at water.MRTask.doAll(MRTask.java:529)
    at water.MRTask.doAll(MRTask.java:549)
    at hex.glm.GLMModel.predictScoreImpl(GLMModel.java:2045)
    at hex.Model.score(Model.java:1938)
    at hex.ensemble.StackedEnsembleModel.predictScoreImpl(StackedEnsembleModel.java:252)
    at hex.Model.score(Model.java:1938)
    at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:497)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1677)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

The error seems to stem from non deep learning models