ellisp / forecastxgb-r-package

An R package for time series models and forecasts with xgboost compatible with {forecast} S3 classes
GNU General Public License v3.0
140 stars 42 forks source link

Error in x[, maxlag + 1] <- time(y2) & Error in x[, maxlag + 2:f] <- seasons #20

Closed mhnierhoff closed 7 years ago

mhnierhoff commented 7 years ago

Hi, thanks for this great package and the new approach option for forecasting time series. But I've ran into two problems with two different time series, while others are working without any problems. 1:

Error in x[, maxlag + 1] <- time(y2) : 
  number of items to replace is not a multiple of replacement length

2:

Error in x[, maxlag + 2:f] <- seasons : 
  number of items to replace is not a multiple of replacement length
In addition: Warning message:
In xgbts(y = ...) :
  y is too short for cross-validation.  Will validate on the most recent 20 per cent instead.

Using the stlf function from the forecasting package works without any errors. Can you explain me what causes the errors and how to avoid them to enable xgb forecasting?

Thanks in advance! 👍

ellisp commented 7 years ago

They might be related to #15 which is quite a recent fix. Were you working with a time series with frequency = 1? If so, if you reinstall the latest version it might fix it.

mhnierhoff commented 7 years ago

Thanks for the quick reply. The time series for problem 1 is frequency = 365.25 (daily data) and for the second one it is 12 (monthly data).

ellisp commented 7 years ago

hmm, ok. I'm not surprised there's bugs. Could you get me a reproducible example, or at least the code that does it? Could do me a favour and devtools::install_github() the latest version (unless you've updated in the past four days), it's just possible the frequency=12 problem would be fixed by that.

This looks like two separate problems. I haven't thought through what to do when frequency is not an integer so that one doesn't surprise me, but the monthly one should be fine.

What's a good daily data source for testing?

mhnierhoff commented 7 years ago

I've just updated the package and there is no effect. But I think I found the problem. There seems to be still a problem with the frequency settings.

Works:

bla_1 <- ts(runif(35, min = 5000, max = 10000))
(or: bla_1 <- ts(runif(35, min = 5000, max = 10000), start = c(2013,12)))

bla_1_XGB_model <- xgbts(y = bla_1)

Stopping. Best iteration: 25

bla_2 <- ts(runif(1076, min = 5000, max = 10000), start = c(2013, yday("2013-12-03")))

bla_2_XGB_model <- xgbts(y = bla_2)

Stopping. Best iteration: 13


Don't works:

bla_1 <- ts(runif(35, min = 5000, max = 10000), start = c(2013,12), frequency = 12)

_Error in x[, maxlag + 2:f] <- seasons : number of items to replace is not a multiple of replacement length In addition: Warning message: In xgbts(y = bla1) : y is too short for cross-validation. Will validate on the most recent 20 per cent instead.

bla_2 <- ts(runif(1076, min = 5000, max = 10000), start = c(2013, yday("2013-12-03")), 
            frequency = 365.25)

bla_2_XGB_model <- xgbts(y = bla_2)

Error in x[, maxlag + 1] <- time(y2) : number of items to replace is not a multiple of replacement length

ellisp commented 7 years ago

Thanks for the reproducible examples. Earthquake permitting, I should be able to fix them over the next week. I may need a different approach to seasonality with high frequency though, as I don't think 366 dummy variables are likely to be useful (although it could be one of several options).

ellisp commented 7 years ago

I'm going to close this. Thanks for bringing it up, it's led to a fruitful line of work.

The problem with the monthly series should be fixed - the short series should at least run, and there is now a better option for these short series (and maybe better full stop) of setting seas_method = 'decompose' which should give better results (i think).

The problem with the daily series is now a duplicate of the content of #22 and #26.