business-science / modeltime.h2o

Forecasting with H2O AutoML. Use the H2O Automatic Machine Learning algorithm as a backend for Modeltime Time Series Forecasting.
https://business-science.github.io/modeltime.h2o/
Other
41 stars 11 forks source link

Provided column type POSIXct is unknown. Cannot proceed with parse due to invalid argument #17

Closed spsanderson closed 3 years ago

spsanderson commented 3 years ago

Hi @mdancho84 and @AlbertoAlmuinha

Session Info:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pacman_0.5.1        timetk_2.6.1        healthyR.data_1.0.1 modeltime.h2o_0.1.1 h2o_3.32.0.1       
 [6] modeltime_0.5.1     yardstick_0.0.8     workflows_0.2.2     tune_0.1.3          tidyr_1.1.3        
[11] tibble_3.1.0        rsample_0.0.9       recipes_0.1.15      purrr_0.3.4         parsnip_0.1.5      
[16] modeldata_0.1.0     infer_0.5.4         ggplot2_3.3.3       dplyr_1.0.5         dials_0.0.9        
[21] scales_1.1.1        broom_0.7.6         tidymodels_0.1.2   

loaded via a namespace (and not attached):
 [1] bitops_1.0-6         xts_0.12.1           bit64_4.0.5          lubridate_1.7.10     DiceDesign_1.9      
 [6] httr_1.4.2           tools_4.0.2          backports_1.2.1      utf8_1.2.1           R6_2.5.0            
[11] rpart_4.1-15         DBI_1.1.1            lazyeval_0.2.2       colorspace_2.0-0     nnet_7.3-14         
[16] withr_2.4.1          tidyselect_1.1.0     bit_4.0.4            compiler_4.0.2       cli_2.4.0           
[21] plotly_4.9.3         labeling_0.4.2       stringr_1.4.0        digest_0.6.27        StanHeaders_2.21.0-7
[26] pkgconfig_2.0.3      htmltools_0.5.1.1    parallelly_1.24.0    lhs_1.1.1            htmlwidgets_1.5.3   
[31] rlang_0.4.10         rstudioapi_0.13      generics_0.1.0       zoo_1.8-9            jsonlite_1.7.2      
[36] crosstalk_1.1.1      RCurl_1.98-1.3       magrittr_2.0.1       Matrix_1.2-18        Rcpp_1.0.6          
[41] munsell_0.5.0        fansi_0.4.2          GPfit_1.0-8          lifecycle_1.0.0      furrr_0.2.2         
[46] stringi_1.5.3        pROC_1.17.0.1        yaml_2.2.1           MASS_7.3-51.6        plyr_1.8.6          
[51] grid_4.0.2           parallel_4.0.2       listenv_0.8.0        forcats_0.5.1        crayon_1.4.1        
[56] lattice_0.20-41      splines_4.0.2        pillar_1.5.1         codetools_0.2-16     glue_1.4.2          
[61] data.table_1.14.0    RcppParallel_5.0.3   vctrs_0.3.7          foreach_1.5.1        gtable_0.3.0        
[66] future_1.21.0        assertthat_0.2.1     gower_0.2.2          prodlim_2019.11.13   class_7.3-17        
[71] survival_3.1-12      viridisLite_0.3.0    timeDate_3043.102    iterators_1.0.13     lava_1.6.9          
[76] globals_0.14.0       ellipsis_0.3.1       ipred_0.9-11   

I am running the following script and getting an error at the model fit portion

Here is my script

if(!require(pacman)){install.packages("pacman")}
pacman::p_load(
    "tidymodels"
    , "modeltime"
    , "modeltime.h2o"
    , "healthyR.data"
    , "dplyr"
    , "timetk"
)

df_tbl <- healthyR_data
data_tbl <- df_tbl %>%
    select(service_line, visit_end_date_time) %>%
    filter_by_time(
        .date_var     = visit_end_date_time
        , .start_date = "2012"
        , .end_date   = "2020-10-31"
    ) %>%
    group_by(service_line) %>%
    summarise_by_time(
        .date_var = visit_end_date_time
        , .by     = "week"
        , value   = n()
    ) %>%
    ungroup() %>%
    filter(!service_line %in% c("Valve Procedure","Vaginal Delivery","Mastectomy")) %>%
    filter(ip_op_flag == "I") %>%
    rename(date = visit_end_date_time) %>%
    mutate(data = lubridate::ymd(date))

data_tbl %>%
    group_by(service_line) %>%
    plot_time_series(
        .legend_show  = FALSE
        , .date_var   = date
        , .value      = value
        , .color_var  = service_line
        , .facet_ncol = 4
        , .smooth     = FALSE
    )

splits <- time_series_split(data_tbl, assess = "12 weeks", cumulative = TRUE)

recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date) 

train_tbl <- training(splits) %>% bake(prep(recipe_spec), .)
test_tbl  <- testing(splits) %>% bake(prep(recipe_spec), .)

h2o.init(
    nthreads = -1,
    ip       = 'localhost',
    port     = 54321
)

model_spec <- automl_reg(mode = 'regression') %>%
    set_engine(
        engine                     = 'h2o',
        max_runtime_secs           = 5, 
        max_runtime_secs_per_model = 3,
        max_models                 = 3,
        nfolds                     = 5,
        exclude_algos              = c("DeepLearning"),
        verbosity                  = NULL,
        seed                       = 786
    ) 

model_spec

model_fitted <- model_spec %>%
    fit(value ~ ., data = train_tbl)

This is the output with the error that I get:

> if(!require(pacman)){install.packages("pacman")}
Loading required package: pacman

> pacman::p_load(
+     "tidymodels"
+     , "modeltime"
+     , "modeltime.h2o"
+     , "healthyR.data"
+     , "dplyr"
+     , "timetk"
+ )

> df_tbl <- healthyR_data

> data_tbl <- df_tbl %>%
+     select(service_line, visit_end_date_time) %>%
+     filter_by_time(
+         .date_var     = visit_end_date_time
+     .... [TRUNCATED] 

> data_tbl %>%
+     group_by(service_line) %>%
+     plot_time_series(
+         .legend_show  = FALSE
+         , .date_var   = date
+         , .va .... [TRUNCATED] 

> splits <- time_series_split(data_tbl, assess = "12 weeks", cumulative = TRUE)
Using date_var: date
Data is not ordered by the 'date_var'. Resamples will be arranged by `date`.
Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.

> recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
+     step_timeseries_signature(date) 

> train_tbl <- training(splits) %>% bake(prep(recipe_spec), .)

> test_tbl  <- testing(splits) %>% bake(prep(recipe_spec), .)

> h2o.init(
+     nthreads = -1,
+     ip       = 'localhost',
+     port     = 54321
+ )
 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         11 hours 21 minutes 
    H2O cluster timezone:       America/New_York 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.32.0.1 
    H2O cluster version age:    6 months and 1 day !!! 
    H2O cluster name:           H2O_started_from_R_Steve_epf668 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   1.74 GB 
    H2O cluster total cores:    8 
    H2O cluster allowed cores:  8 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 
    R Version:                  R version 4.0.2 (2020-06-22) 

> model_spec <- automl_reg(mode = 'regression') %>%
+     set_engine(
+         engine                     = 'h2o',
+         max_runtime_secs         .... [TRUNCATED] 

> model_spec
H2O AutoML Model Specification (regression)

Engine-Specific Arguments:
  max_runtime_secs = 5
  max_runtime_secs_per_model = 3
  max_models = 3
  nfolds = 5
  exclude_algos = c("DeepLearning")
  verbosity = NULL
  seed = 786

Computational engine: h2o 

> model_fitted <- model_spec %>%
+     fit(value ~ ., data = train_tbl)
Converting to H2OFrame...

ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/3/Parse)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument."
 [2] "    water.parser.ParseSetup.strToColumnTypes(ParseSetup.java:248)"                                                                         
 [3] "    water.api.ParseHandler.parse(ParseHandler.java:25)"                                                                                    
 [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                                                                           
 [5] "    sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)"                                                                           
 [6] "    sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)"                                                                       
 [7] "    java.lang.reflect.Method.invoke(Unknown Source)"                                                                                       
 [8] "    water.api.Handler.handle(Handler.java:60)"                                                                                             
 [9] "    water.api.RequestServer.serve(RequestServer.java:470)"                                                                                 
[10] "    water.api.RequestServer.doGeneric(RequestServer.java:301)"                                                                             
[11] "    water.api.RequestServer.doPost(RequestServer.java:227)"                                                                                
[12] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"                                                                          
[13] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"                                                                          
[14] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)"                                                                
[15] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)"                                                            
[16] "    org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)"                                                     
[17] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)"                                                    
[18] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)"                                                      
[19] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)"                                                             
[20] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)"                                                      
[21] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)"                                                     
[22] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)"                                                         
[23] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                                                 
[24] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                                                       
[25] "    water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)"                                          
[26] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                                                 
[27] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                                                       
[28] "    org.eclipse.jetty.server.Server.handle(Server.java:531)"                                                                               
[29] "    org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)"                                                                     
[30] "    org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)"                                                           
[31] "    org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)"                                           
[32] "    org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)"                                                                     
[33] "    org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)"                                                                  
[34] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)"                                                
[35] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)"                                              
[36] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)"                                             
[37] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)"                                                    
[38] "    org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)"                              
[39] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)"                                                      
[40] "    org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)"                                                       
[41] "    java.lang.Thread.run(Unknown Source)"                                                                                                  

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 

ERROR MESSAGE:

Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument.

In addition: Warning messages:
1: package ‘pacman’ was built under R version 4.0.3 
2: In h2o.clusterInfo() : 
Your H2O cluster version is too old (6 months and 1 day)!
Please download and install the latest version from http://h2o.ai/download/
Timing stopped at: 0.4 0.01 0.77
spsanderson commented 3 years ago

this was a picnic (problem in chair not in computer)