Closed Steviey closed 2 years ago
Thanks for the bug report. This is not an intended behaviour, and I will fix this. In the meantime, please use "h=1" in forecast to resolve this.
Hey,
it works with higher 'h' but not with newdata...
Yes. This is because the newdata has 1 observation, while 10 are needed, and it tries using the last available values as future ones, but there is an issue with the bloody data.frame(). Fix is comming.
Fixed in 097674d9d0526c1f020bb80b8b0cd2142d916bd4
The first test with tidymodels fails....
`` Error in eval(predvars, data, env) : object 'n2cat' not found In addition: Warning message: The newdata has 1 observations, while 10 are needed. Using the last available values as future ones. x
<fn>
()Noticed1: Works wit dummy data. Noticed2: Heavy payload on 367 columns... seems to be to much... reducing cols. ... wrong data-format, my fault.
Result: native functional
In general, it is recommended to provide at least as much data (explanatory variables), as the length of forecast horizon, i.e. nrow(newdata)>=h
in forecast(..., h=h, newdata=newdata)
Looks a bit weird with other data and I probably get a tidymodels problem now. But hey it's fun.
Yes, looks ridiculous (plus it overfits the data) :). But I'm glad you enjoy it.
(20-40 features) ... Looks better with...
myPlot<-plot(generics::forecast(myModel,newdata=testOneStep,h=nrow(testOneStep),interval="prediction", level=c(0.9,0.95)))
re overfitting: This was without filtering recipes. I have created x>100 k low impact features (with PCA several thousand). :-)
model.frame.default(testFormula, data = xreg) ... hm hm :-)
If there is an error, please provide a short reproducible example, so that I can debug and fix.
Do you cast xRegs to lower case?
Error in eval(predvars, data, env) : object 'n2cat' not found
... its actually called n2Cat.
tidymodels reproducible example is harder to prepare... will take a while...
I don't use n2cat
in my functions.
From my point of view, there is nothing native wrong. But the back trace looks suspicious.
library("Hmisc")
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(lightgbm))
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(RSQLite))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(dbplyr))
suppressPackageStartupMessages(library(rlang))
suppressPackageStartupMessages(library(freqdist))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(tidyquant))
suppressPackageStartupMessages(library(tibbletime))
suppressPackageStartupMessages(library(anomalize))
suppressPackageStartupMessages(library(smooth))
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(mgcv))
suppressPackageStartupMessages(library(fable))
suppressPackageStartupMessages(library(fabletools))
suppressPackageStartupMessages(library(tsibble))
suppressPackageStartupMessages(library(tsibbledata))
suppressPackageStartupMessages(library(tsfeatures))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggrepel))
suppressPackageStartupMessages(library(runner))
suppressPackageStartupMessages(library(ggformula))
suppressPackageStartupMessages(library(fANCOVA))
suppressPackageStartupMessages(library(stats))
suppressPackageStartupMessages(library(TTR))
suppressPackageStartupMessages(library(xts))
suppressPackageStartupMessages(library(vip))
suppressPackageStartupMessages(library(yardstick))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(catboost))
suppressPackageStartupMessages(library(treesnip))
suppressPackageStartupMessages(library(broom))
suppressPackageStartupMessages(library(finetune))
suppressPackageStartupMessages(library(tabnet))
suppressPackageStartupMessages(library(lobstr))
#suppressPackageStartupMessages(library(forecast))
######### - reproducible example start - ##########
prepMtData <-function(df){
ret<-list()
rownames(df) <- NULL
df <- df %>% dplyr::mutate(id=row_number()) %>% relocate(id)
ret$df <- df
df_tsbl <- df %>%
as_tsibble(index=id)
fab_ts_split <- initial_time_split(df_tsbl, prop = 3/4)
ret$fab_df_train <- training(fab_ts_split)
ret$fab_df_test <- testing(fab_ts_split)
ret$fab_ts_split <- fab_ts_split
ret$fab_df_split <- NA
myH<-1
myLimit <- as.numeric(nrow(df_tsbl)-myH)
ret$fab_trainOneStep <- df_tsbl %>% filter(id <= myLimit)
ret$fab_testOneStep <- df_tsbl %>% filter(id > myLimit)
df_split <- rsample::initial_time_split(df_tsbl, prop = 0.8)
train_len <- length(df_split$in_id)
test_len <- length(df_split$out_id)
df_tsbl <- as.data.frame(df_tsbl) # modeltime input format
splits <- df_tsbl %>%
timetk::time_series_split(assess=test_len,cumulative=TRUE)#kann auch int rein!
ret$df_train <- rsample::training(splits)
ret$df_test <- rsample::testing(splits)
#View(ret$df_test)
ret$df_split<-df_split
ret$ts_split<-splits
myH<-10
myLimit <- as.numeric(nrow(df_tsbl)-myH)
ret$trainOneStep <- df %>% filter(id <= myLimit)
ret$testOneStep <- df %>% filter(id > myLimit)
return(ret)
}
getTheFuckTheFormula <-function(xRegCols){
xRegCols <- xRegCols[xRegCols!='value']
xRegCols <- xRegCols[xRegCols!='id']
xRegCols <- xRegCols[xRegCols!='date']
myFormula <- xRegCols
myFormula <- paste0(myFormula, collapse= "+")
myFormula <- paste0('value ~ date + ',myFormula)
myFormula <- as.formula(myFormula)
return(myFormula)
}
liveData<-0
if(liveData<1){
idx=120
y<-rnorm(idx, mean=15, sd=5)
x<-cbind(
x1 = rnorm(idx, mean=15, sd=5)
,x2 = rnorm(idx, mean=15, sd=5)
,x3 = rnorm(idx, mean=15, sd=5)
,x4 = rnorm(idx, mean=15, sd=5)
,x5 = rnorm(idx, mean=15, sd=5)
)
df <- data.frame(x=x,value=y,stringsAsFactors=F)
df <- df %>% dplyr::relocate(value)
dataLength <- nrow(df)
xRegCols <- colnames(df[,2:6])
myTime <- tk_make_timeseries("2011", length_out=dataLength, include_endpoints = FALSE)
df$date <- myTime
}
dataObj <-prepMtData(df)
df_train <-dataObj$df_train
df_test <-dataObj$df_test
trainOneStep <-dataObj$trainOneStep
testOneStep <-dataObj$testOneStep
model_spec <- adam_reg(
#seasonal_period = 5
#,ets_model ='ANA'
# error = param1
# ,trend = param2
# ,season = param3
# ,damping = param4
# ,smooth_level = param5
# ,smooth_trend = param6
# ,smooth_seasonal = param7
) %>%
set_engine("auto_adam",holdout=FALSE,silent=FALSE,h=1)
model_spec<-parsnip::eval_args(model_spec)
myFormula <- getTheFuckTheFormula(xRegCols)
recipe_spec <- recipe(myFormula,trainOneStep)
print(recipe_spec)
set.seed(123)
wflw<- workflow() %>%
add_model(model_spec) %>%
add_recipe(recipe_spec)
wflw_fit <- wflw %>% fit(trainOneStep)
myModel<-wflw_fit[['fit']][['fit']][['fit']][['models']][['model_1']]
myFc <- generics::forecast(myModel,newdata=testOneStep, interval="prediction", level=c(0.50))
stop()
######### - reproducible example end - ##########
Maybe I should investigate it a little deeper. Is there any direction you see adthoc?
+-global pslPlotModel(...) at R/PslTools/dummy.R:939:12
| +-generics::forecast(...) at R/PslTools/dummy.R:361:12
| -smooth:::forecast.adam(...)
| +-stats::model.frame(testFormula, data = xreg)
| -stats::model.frame.default(testFormula, data = xreg)
| -base::eval(predvars, data, env)
| -base::eval(predvars, data, env)
-global <fn>()
-lobstr::cst() at R/PslTools/dummy.R:53:25
No traceback available
So, I don't know what happens here and why. model.frame()
is used to expand the data.frame into a matrix. If you cannot reproduce this on a small example, then just make sure that you follow this:
newdata
that you provide to forecast function should be equal to the forecast horizon. Otherwise the function will substitute it with something else.newdata
has exactly the same set of variables as the data
used in adam(), with exactly the same names of variables.formula
in adam, then function will use formula y~.
, substituting y
with the name of your variable. This typically works fine, but you can also try writing the formula explicitly.Hope this helps.
Is there a significant qualitative difference to simply say 'h=10' or say 'newdata=test_df' where test_df has nrow()=10? Otherwise it might be no top prio for me. In my understanding, with newdata, I only would have the chance to introduce future xregs. Am I right?
The most efficient way is h=nrow(data)
. Because otherwise function will try fixing the length, and it won't not necessarily be correct. You have explanatory for a reason. You can either control them (prices, promotions) or predict them to some extent (weather). It's better to provide future values than to let it deal with it on its own.
Thank you, I will do it a little later. There is much hidden stuff in there (modeltime.ensemble).
Windows 7, greybox 1.0.5.41001, smooth 3.1.6.41004, R version 4.0.5 Using this code I get the following error....