business-science / modeltime.h2o

Forecasting with H2O AutoML. Use the H2O Automatic Machine Learning algorithm as a backend for Modeltime Time Series Forecasting.
https://business-science.github.io/modeltime.h2o/
Other
38 stars 11 forks source link

Run h2o models with POSIXct datetime objects #22

Open agila5 opened 3 years ago

agila5 commented 3 years ago

Dear @mdancho84 , first of all thank you very much for developing this amazing ecosystem for time series modelling.

I just started learning the basic ideas and packages, and I was wondering if it's possible to use h2o models with date-time (or POSIXct) objects. For example, I tried to replicate the introductory vignette changing the field Date from Date to POSIXct, but then the examples fail.

# packages
suppressPackageStartupMessages({
  library(tidymodels)
  library(modeltime.h2o)
  library(tidyverse)
  library(timetk)
})

# data
data_tbl <- walmart_sales_weekly %>%
  select(id, Date, Weekly_Sales) %>% 
  mutate(Date = as.POSIXct(Date)) %>% 
  arrange(Date)
splits <- time_series_split(data_tbl, assess = "3 month", cumulative = TRUE)
#> Using date_var: Date
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.

# recipe steps
recipe_spec <- recipe(Weekly_Sales ~ ., data = training(splits)) %>%
  step_timeseries_signature(Date)
train_tbl <- training(splits) %>% bake(prep(recipe_spec), .)

h2o.init(
  nthreads = -1,
  ip       = 'localhost',
  port     = 54321
)
#> 
#> H2O is not running yet, starting it now...
#> 
#> Note:  In case of errors look at the following log files:
#>     C:\Users\Utente\AppData\Local\Temp\RtmpyaXrog\file12f45bc01ac1/h2o_Utente_started_from_r.out
#>     C:\Users\Utente\AppData\Local\Temp\RtmpyaXrog\file12f43b7a685a/h2o_Utente_started_from_r.err
#> 
#> 
#> Starting H2O JVM and connecting:  Connection successful!
#> 
#> R is connected to the H2O cluster: 
#>     H2O cluster uptime:         3 seconds 377 milliseconds 
#>     H2O cluster timezone:       Europe/Berlin 
#>     H2O data parsing timezone:  UTC 
#>     H2O cluster version:        3.32.1.2 
#>     H2O cluster version age:    23 days  
#>     H2O cluster name:           H2O_started_from_R_Utente_zyo605 
#>     H2O cluster total nodes:    1 
#>     H2O cluster total memory:   1.75 GB 
#>     H2O cluster total cores:    4 
#>     H2O cluster allowed cores:  4 
#>     H2O cluster healthy:        TRUE 
#>     H2O Connection ip:          localhost 
#>     H2O Connection port:        54321 
#>     H2O Connection proxy:       NA 
#>     H2O Internal Security:      FALSE 
#>     H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4 
#>     R Version:                  R version 4.0.5 (2021-03-31)

model_spec <- automl_reg(mode = 'regression') %>%
  set_engine(
    engine                     = 'h2o',
    max_runtime_secs           = 5, 
    max_runtime_secs_per_model = 3,
    max_models                 = 3,
    nfolds                     = 5,
    exclude_algos              = c("DeepLearning"),
    verbosity                  = NULL,
    seed                       = 786
  ) 

model_fitted <- model_spec %>%
  fit(Weekly_Sales ~ ., data = train_tbl)
#> Converting to H2OFrame...
#> 
#> ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/3/Parse)
#> 
#> water.exceptions.H2OIllegalArgumentException
#>  [1] "water.exceptions.H2OIllegalArgumentException: Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument."
#>  [2] "    water.parser.ParseSetup.strToColumnTypes(ParseSetup.java:257)"                                                                         
#>  [3] "    water.api.ParseHandler.parse(ParseHandler.java:25)"                                                                                    
#>  [4] "    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)"                                                                           
#>  [5] "    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)"                                                         
#>  [6] "    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)"                                                 
#>  [7] "    java.lang.reflect.Method.invoke(Method.java:498)"                                                                                      
#>  [8] "    water.api.Handler.handle(Handler.java:60)"                                                                                             
#>  [9] "    water.api.RequestServer.serve(RequestServer.java:470)"                                                                                 
#> [10] "    water.api.RequestServer.doGeneric(RequestServer.java:301)"                                                                             
#> [11] "    water.api.RequestServer.doPost(RequestServer.java:227)"                                                                                
#> [12] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:707)"                                                                          
#> [13] "    javax.servlet.http.HttpServlet.service(HttpServlet.java:790)"                                                                          
#> [14] "    org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)"                                                                
#> [15] "    org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)"                                                            
#> [16] "    org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)"                                                     
#> [17] "    org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)"                                                    
#> [18] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)"                                                      
#> [19] "    org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)"                                                             
#> [20] "    org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)"                                                      
#> [21] "    org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)"                                                     
#> [22] "    org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)"                                                         
#> [23] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                                                 
#> [24] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                                                       
#> [25] "    water.webserver.jetty9.Jetty9ServerAdapter$LoginHandler.handle(Jetty9ServerAdapter.java:130)"                                          
#> [26] "    org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)"                                                 
#> [27] "    org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)"                                                       
#> [28] "    org.eclipse.jetty.server.Server.handle(Server.java:531)"                                                                               
#> [29] "    org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)"                                                                     
#> [30] "    org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)"                                                           
#> [31] "    org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)"                                           
#> [32] "    org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)"                                                                     
#> [33] "    org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)"                                                                  
#> [34] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)"                                                
#> [35] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)"                                              
#> [36] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)"                                             
#> [37] "    org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)"                                                    
#> [38] "    org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)"                              
#> [39] "    org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)"                                                      
#> [40] "    org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)"                                                       
#> [41] "    java.lang.Thread.run(Thread.java:748)"
#> Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : 
#> 
#> ERROR MESSAGE:
#> 
#> Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument.
#> Timing stopped at: 0.62 0.15 1.44

h2o.shutdown(prompt = FALSE)

Created on 2021-05-22 by the reprex package (v2.0.0)

Can you help me?

mdancho84 commented 3 years ago

Will need to take a look this upcoming week. For some reason the date index column is getting passed to H2O AutoML, which is no good. We only want features derived from that date column to be passed.

agila5 commented 3 years ago

Ok, I will wait for your response. Thank you very much!

jcrodriguez1989 commented 3 years ago

Hi @mdancho84 , thanks for this package and all the documentation! I am also trying to use {modeltime.h2o} with datetime data (hourly-specific). Let me know if you would need a reprex :)

jcrodriguez1989 commented 3 years ago

Hi @mdancho84 , I was doing some research on this issue, and I think this is an H2O issue, not specific to {modeltime.h2o} (I will submit a PR to H2O about this). This is the reprex I built, that would fail for both {modeltime.h2o} and {h2o}:

> library("dplyr")
> library("h2o")
> library("lubridate")
> hourly_calls <- tibble(
+   ds = seq.POSIXt(now() - days(7), now(), by = "hour"),
+   calls = rpois(7 * 24 + 1, lambda = 4)
+ )
> h2o.init()
> as.h2o(hourly_calls)

ERROR: Unexpected HTTP Status code: 412 Precondition Failed (url = http://localhost:54321/3/Parse)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument."
...
[41] "    java.base/java.lang.Thread.run(Thread.java:829)"                                                                                       

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 

ERROR MESSAGE:

Provided column type POSIXct is unknown.  Cannot proceed with parse due to invalid argument.

However, I found something curious while checking this out. In automl_fit_impl function definition, on line 180 we are using a variable named data which is not provided as input and neither it is in ... by default. The conditional is evaluating always to TRUE as it is using utils::data function. I guess this line should be: if (!inherits(x, "H2OFrame")) {

Would you like me to fix this in a PR?

mdancho84 commented 3 years ago

Oh wow, that's funny about the data bug. Yes, please feel free to submit a PR. We can fix that.

Regarding the POSIXct format, that is interesting. I'm surprised we are passing it as a feature, but it's been a while since I reviewed the code.

One solution is to simply remove the name of the datetime feature from the x_nms variable. This would prevent the POSIXct is unknown error.

jcrodriguez1989 commented 3 years ago

Well, the POSIXct column is my response variable, so I need to get it passed to H2O. I have submitted this PR to H2O so they get this issue solved. Meanwhile, @agila5 , you could try installing https://github.com/jcrodriguez1989/h2o-3/tree/ash2o_posixct/h2o-r/h2o-package , but it is not as easy as remotes::install_github(...). Let me know if you need further help installing it. With this H2O fix, I am being able to run modeltime.h2o with datetime response variable 💃💃💃

agila5 commented 3 years ago

Hi @jcrodriguez1989 and thank you very much for your message, I will test your solution/PR as soon as possible.