config-i1 / smooth

The set of functions used for time series analysis and in forecasting.
89 stars 19 forks source link

Error in match.names(clabs, names(xi)) #196

Closed Steviey closed 2 years ago

Steviey commented 2 years ago

Windows 7, greybox 1.0.5.41001, smooth 3.1.6.41004, R version 4.0.5 Using this code I get the following error....

Sys.setenv(LANG = "en")
options(scipen = 999)
options(dplyr.summarise.inform=F)
options(max.print=2000) 
library("Hmisc")
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(lightgbm))
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(RSQLite))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(dbplyr))
suppressPackageStartupMessages(library(rlang))
suppressPackageStartupMessages(library(freqdist))
suppressPackageStartupMessages(library(tidymodels))
options(tidymodels.dark = TRUE)
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse)) 
suppressPackageStartupMessages(library(rsample)) 
suppressPackageStartupMessages(library(tidyquant))
suppressPackageStartupMessages(library(tibbletime))
suppressPackageStartupMessages(library(anomalize))
suppressPackageStartupMessages(library(smooth))
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(mgcv))
suppressPackageStartupMessages(library(fable))
suppressPackageStartupMessages(library(fabletools))
suppressPackageStartupMessages(library(tsibble))
suppressPackageStartupMessages(library(tsibbledata))
suppressPackageStartupMessages(library(tsfeatures))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggrepel))
suppressPackageStartupMessages(library(runner))
suppressPackageStartupMessages(library(ggformula))
suppressPackageStartupMessages(library(fANCOVA))
suppressPackageStartupMessages(library(stats))
suppressPackageStartupMessages(library(TTR))
suppressPackageStartupMessages(library(xts))
suppressPackageStartupMessages(library(vip))
suppressPackageStartupMessages(library(yardstick))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(catboost))
suppressPackageStartupMessages(library(treesnip))
suppressPackageStartupMessages(library(broom))
suppressPackageStartupMessages(library(finetune))
suppressPackageStartupMessages(library(tabnet))
suppressPackageStartupMessages(library(lobstr))
suppressPackageStartupMessages(library(forecast))
options(error=function(){lobstr::cst();traceback(max.lines=1)})

liveData<-0
if(liveData<1){

    idx=120
    y<-rnorm(idx, mean=15, sd=5)
    x<-cbind(
        x1 =  rnorm(idx, mean=15, sd=5)
        ,x2 =  rnorm(idx, mean=15, sd=5)
        ,x3 =  rnorm(idx, mean=15, sd=5)
        ,x4 =  rnorm(idx, mean=15, sd=5)
        ,x5 =  rnorm(idx, mean=15, sd=5)
    )   
    df    <- data.frame(x=x,value=y,stringsAsFactors=F)
    df    <- df %>% dplyr::relocate(value)
}

rownames(df) <- NULL
df      <-  df %>% dplyr::mutate(id=row_number()) %>% relocate(id)
myH<-1
myLimit      <- as.numeric(nrow(df)-myH)
trainOneStep <- df %>% filter(id <= myLimit)
testOneStep  <- df %>% filter(id > myLimit)

View(trainOneStep)
View(testOneStep)

myModel <- adam(trainOneStep,"ANN",silent=TRUE,h=1,holdout=FALSE)

# #myPlot<-plot(forecast::forecast(myModel,h=10,interval="complete",nsim=100),main="Complete prediction interval")
#myPlot<-plot(generics::forecast(myModel,newdata=testOneStep,interval="complete",nsim=100),main="Complete prediction interval")
myPlot<-plot(generics::forecast(myModel,newdata=testOneStep, interval="prediction", level=c(0.9,0.95)))

print(myPlot)

Error in match.names(clabs, names(xi)) : 
  names do not match previous names
In addition: Warning message:
The newdata has 1 observations, while 10 are needed. Using the last available values as future ones. 
    x
 1. +-base::plot(...) at R/PslTools/dummy.R:824:0
 2. +-generics::forecast(...) at R/PslTools/dummy.R:824:0
 3. +-smooth:::forecast.adam(...)
 4. | \-base::rbind(...)
 5. |   \-base::rbind(deparse.level, ...)
 6. |     \-base match.names(clabs, names(xi))
 7. |       \-base::stop("names do not match previous names")
 8. \-global `<fn>`()
 9.   \-lobstr::cst() at R/PslTools/dummy.R:55:25
13: stop("names do not match previous names")
12: match.names(clabs, names(xi))
11: rbind(deparse.level, ...)
10: rbind(newdata, matrix(rep(tail(newdata, 1), each = newnRows), 
     ...
9: reforecast.adam(object, h = h, newdata = newdata, occurrence = occurrence, 
    ...
8: reforecast(object, h = h, newdata = newdata, occurrence = occurrence, 
    ...
7: forecast.adam(myModel, newdata = testOneStep, interval = "complete", 
    ...
6: generics::forecast(myModel, newdata = testOneStep, interval = "complete", 
    ... at dummy.R#823
5: plot(generics::forecast(myModel, newdata = testOneStep, interval = "complete", 
    ... at dummy.R#823
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
config-i1 commented 2 years ago

Thanks for the bug report. This is not an intended behaviour, and I will fix this. In the meantime, please use "h=1" in forecast to resolve this.

Steviey commented 2 years ago

Hey,

it works with higher 'h' but not with newdata...

image

config-i1 commented 2 years ago

Yes. This is because the newdata has 1 observation, while 10 are needed, and it tries using the last available values as future ones, but there is an issue with the bloody data.frame(). Fix is comming.

config-i1 commented 2 years ago

Fixed in 097674d9d0526c1f020bb80b8b0cd2142d916bd4

Steviey commented 2 years ago

The first test with tidymodels fails....

`` Error in eval(predvars, data, env) : object 'n2cat' not found In addition: Warning message: The newdata has 1 observations, while 10 are needed. Using the last available values as future ones. x

  1. +-global pslPlotModel(...) at R/PslTools/dummy.R:939:12
  2. | +-generics::forecast(...) at R/PslTools/dummy.R:361:12
  3. | -smooth:::forecast.adam(...)
  4. | +-stats::model.frame(testFormula, data = xreg)
  5. | -stats::model.frame.default(testFormula, data = xreg)
  6. | -base::eval(predvars, data, env)
  7. | -base::eval(predvars, data, env)
  8. -global <fn>()
  9. -lobstr::cst() at R/PslTools/dummy.R:53:25 No traceback available `` Testing now native....

Noticed1: Works wit dummy data. Noticed2: Heavy payload on 367 columns... seems to be to much... reducing cols. ... wrong data-format, my fault.

Result: native functional

config-i1 commented 2 years ago

In general, it is recommended to provide at least as much data (explanatory variables), as the length of forecast horizon, i.e. nrow(newdata)>=h in forecast(..., h=h, newdata=newdata)

Steviey commented 2 years ago

Looks a bit weird with other data and I probably get a tidymodels problem now. But hey it's fun.

image

config-i1 commented 2 years ago

Yes, looks ridiculous (plus it overfits the data) :). But I'm glad you enjoy it.

Steviey commented 2 years ago

(20-40 features) ... Looks better with...

myPlot<-plot(generics::forecast(myModel,newdata=testOneStep,h=nrow(testOneStep),interval="prediction", level=c(0.9,0.95)))

image

image

Steviey commented 2 years ago

re overfitting: This was without filtering recipes. I have created x>100 k low impact features (with PCA several thousand). :-)

Steviey commented 2 years ago

model.frame.default(testFormula, data = xreg) ... hm hm :-)

config-i1 commented 2 years ago

If there is an error, please provide a short reproducible example, so that I can debug and fix.

Steviey commented 2 years ago

Do you cast xRegs to lower case?

Error in eval(predvars, data, env) : object 'n2cat' not found

... its actually called n2Cat.

tidymodels reproducible example is harder to prepare... will take a while...

config-i1 commented 2 years ago

I don't use n2cat in my functions.

Steviey commented 2 years ago

From my point of view, there is nothing native wrong. But the back trace looks suspicious.

Steviey commented 2 years ago
library("Hmisc")
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(lightgbm))
suppressPackageStartupMessages(library(digest))
suppressPackageStartupMessages(library(RSQLite))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(dbplyr))
suppressPackageStartupMessages(library(rlang))
suppressPackageStartupMessages(library(freqdist))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse)) 
suppressPackageStartupMessages(library(rsample)) 
suppressPackageStartupMessages(library(tidyquant))
suppressPackageStartupMessages(library(tibbletime))
suppressPackageStartupMessages(library(anomalize))
suppressPackageStartupMessages(library(smooth))
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(mgcv))
suppressPackageStartupMessages(library(fable))
suppressPackageStartupMessages(library(fabletools))
suppressPackageStartupMessages(library(tsibble))
suppressPackageStartupMessages(library(tsibbledata))
suppressPackageStartupMessages(library(tsfeatures))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggrepel))
suppressPackageStartupMessages(library(runner))
suppressPackageStartupMessages(library(ggformula))
suppressPackageStartupMessages(library(fANCOVA))
suppressPackageStartupMessages(library(stats))
suppressPackageStartupMessages(library(TTR))
suppressPackageStartupMessages(library(xts))
suppressPackageStartupMessages(library(vip))
suppressPackageStartupMessages(library(yardstick))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(catboost))
suppressPackageStartupMessages(library(treesnip))
suppressPackageStartupMessages(library(broom))
suppressPackageStartupMessages(library(finetune))
suppressPackageStartupMessages(library(tabnet))
suppressPackageStartupMessages(library(lobstr))
#suppressPackageStartupMessages(library(forecast))

######### - reproducible example start - ##########

prepMtData  <-function(df){
    ret<-list()
    rownames(df) <- NULL
    df      <-  df %>% dplyr::mutate(id=row_number()) %>% relocate(id)
    ret$df  <-  df
    df_tsbl <-  df %>% 
        as_tsibble(index=id)   

    fab_ts_split        <- initial_time_split(df_tsbl, prop = 3/4)
    ret$fab_df_train    <- training(fab_ts_split)
    ret$fab_df_test     <- testing(fab_ts_split)

    ret$fab_ts_split    <- fab_ts_split
    ret$fab_df_split    <- NA

    myH<-1
    myLimit          <- as.numeric(nrow(df_tsbl)-myH)
    ret$fab_trainOneStep <- df_tsbl %>% filter(id <= myLimit)
    ret$fab_testOneStep  <- df_tsbl %>% filter(id > myLimit)    

    df_split    <- rsample::initial_time_split(df_tsbl, prop = 0.8)
    train_len   <- length(df_split$in_id)
    test_len    <- length(df_split$out_id)

    df_tsbl     <- as.data.frame(df_tsbl) # modeltime input format

    splits <- df_tsbl %>%
        timetk::time_series_split(assess=test_len,cumulative=TRUE)#kann auch int rein!

    ret$df_train    <- rsample::training(splits)
    ret$df_test     <- rsample::testing(splits)
    #View(ret$df_test)

    ret$df_split<-df_split
    ret$ts_split<-splits

    myH<-10
    myLimit          <- as.numeric(nrow(df_tsbl)-myH)
    ret$trainOneStep <- df %>% filter(id <= myLimit)
    ret$testOneStep  <- df %>% filter(id > myLimit)

    return(ret)
}

getTheFuckTheFormula <-function(xRegCols){
    xRegCols  <- xRegCols[xRegCols!='value'] 
    xRegCols  <- xRegCols[xRegCols!='id'] 
    xRegCols  <- xRegCols[xRegCols!='date'] 

    myFormula <- xRegCols
    myFormula <- paste0(myFormula, collapse= "+")
    myFormula <- paste0('value ~ date + ',myFormula)
    myFormula <- as.formula(myFormula)
    return(myFormula)
}

liveData<-0
if(liveData<1){
    idx=120
    y<-rnorm(idx, mean=15, sd=5)
    x<-cbind(
         x1 =  rnorm(idx, mean=15, sd=5)
        ,x2 =  rnorm(idx, mean=15, sd=5)
        ,x3 =  rnorm(idx, mean=15, sd=5)
        ,x4 =  rnorm(idx, mean=15, sd=5)
        ,x5 =  rnorm(idx, mean=15, sd=5)
    )   
    df         <- data.frame(x=x,value=y,stringsAsFactors=F)
    df         <- df %>% dplyr::relocate(value)
    dataLength <- nrow(df)
    xRegCols   <- colnames(df[,2:6])
    myTime     <- tk_make_timeseries("2011", length_out=dataLength, include_endpoints = FALSE)
    df$date    <- myTime

}

dataObj         <-prepMtData(df)
df_train        <-dataObj$df_train
df_test         <-dataObj$df_test
trainOneStep    <-dataObj$trainOneStep
testOneStep     <-dataObj$testOneStep

model_spec  <- adam_reg(
    #seasonal_period   = 5
    #,ets_model ='ANA'
    #  error           = param1
    # ,trend           = param2
    # ,season          = param3    
    # ,damping         = param4   
    # ,smooth_level    = param5
    # ,smooth_trend    = param6
    # ,smooth_seasonal = param7
) %>%
set_engine("auto_adam",holdout=FALSE,silent=FALSE,h=1)
model_spec<-parsnip::eval_args(model_spec)

myFormula   <- getTheFuckTheFormula(xRegCols)
recipe_spec <- recipe(myFormula,trainOneStep)

print(recipe_spec)

set.seed(123)
wflw<- workflow() %>%
    add_model(model_spec) %>%
    add_recipe(recipe_spec)

wflw_fit <- wflw %>% fit(trainOneStep)
myModel<-wflw_fit[['fit']][['fit']][['fit']][['models']][['model_1']]

myFc        <- generics::forecast(myModel,newdata=testOneStep, interval="prediction", level=c(0.50))
stop()  

######### - reproducible example end - ##########
Steviey commented 2 years ago

Maybe I should investigate it a little deeper. Is there any direction you see adthoc?

+-global pslPlotModel(...) at R/PslTools/dummy.R:939:12
| +-generics::forecast(...) at R/PslTools/dummy.R:361:12
| -smooth:::forecast.adam(...)
| +-stats::model.frame(testFormula, data = xreg)
| -stats::model.frame.default(testFormula, data = xreg)
| -base::eval(predvars, data, env)
| -base::eval(predvars, data, env)
-global <fn>()
-lobstr::cst() at R/PslTools/dummy.R:53:25
No traceback available
config-i1 commented 2 years ago

So, I don't know what happens here and why. model.frame() is used to expand the data.frame into a matrix. If you cannot reproduce this on a small example, then just make sure that you follow this:

  1. Number of rows in the newdata that you provide to forecast function should be equal to the forecast horizon. Otherwise the function will substitute it with something else.
  2. Make sure that the newdata has exactly the same set of variables as the data used in adam(), with exactly the same names of variables.
  3. If you do not use formula in adam, then function will use formula y~., substituting y with the name of your variable. This typically works fine, but you can also try writing the formula explicitly.

Hope this helps.

Steviey commented 2 years ago

Is there a significant qualitative difference to simply say 'h=10' or say 'newdata=test_df' where test_df has nrow()=10? Otherwise it might be no top prio for me. In my understanding, with newdata, I only would have the chance to introduce future xregs. Am I right?

config-i1 commented 2 years ago

The most efficient way is h=nrow(data). Because otherwise function will try fixing the length, and it won't not necessarily be correct. You have explanatory for a reason. You can either control them (prices, promotions) or predict them to some extent (weather). It's better to provide future values than to let it deal with it on its own.

Steviey commented 2 years ago

Thank you, I will do it a little later. There is much hidden stuff in there (modeltime.ensemble).