gaurbans / ecm

R package 'ecm'
3 stars 1 forks source link

Wilshire 5000 example #3

Closed crossxwill closed 3 years ago

crossxwill commented 3 years ago

Thank you for the package! The ecm() and ecmpredict() are big time savers.

In the help file for ecm(), you list the following warning:

This means the user may need to consider stationarity and/or cointegration before using the model.

The example uses ecm() to predict the Wilshire 5000 Index. However, the example is spurious because the residuals in the long-run equilibrium regression are non-stationary. There is no co-integration between the Index and the regressors (i.e., CorpProfits, FedFundsRate, UnempRate).

library(tidyverse)
library(ecm)
library(urca)

#Use ecm to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate

data(Wilshire)

#Use 2015-12-01 and earlier data to build models

trn <- Wilshire[Wilshire$date<='2015-12-01',]

#Assume all predictors are needed in the equilibrium and transient terms of ecm

xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
model1 <- ecm(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE)

### long-run equil model

lm_long_run <- lm(Wilshire5000 ~ CorpProfits + FedFundsRate + UnempRate, data=trn)

fitted_resids <- ur.df(lm_long_run$residuals, type="none", lags=10, selectlags="BIC")

summary(fitted_resids)

## null: non-stationary residuals
## fail to reject null
## there is no equilibrium, the model is spurious

tseries::po.test(x=trn[c('Wilshire5000', 'CorpProfits', 'FedFundsRate', 'UnempRate')])

# null: not co-integrated
# fail to reject null
## there is no equilibrium, the model is spurious
gaurbans commented 3 years ago

Thank you for this comment, William. I think technically error correction models should be run on stationary or co-integrated data, which the Wilshire5000 isn't. However, I'm not sure of the exact importance of this. In my previous finance job where we used ECMs quite a lot, we often built them on non-stationary or non-cointegrated data, and these models passed muster. A big reason for this was that the coefficients would be more interpretable without differenced data. For the purposes of the Wilshire5000 dataset example in the package, I'll modify the example to test for stationarity/cointegration, as you have shown, and difference the data prior building an ECM.

crossxwill commented 3 years ago

I think you're okay with the example if you show the Newey-West p-values. Since you're modeling the first difference of the Wilshire5000 in ecm(), your residuals are stationary (regardless of whether the equilibrium regression is spurious). However, the residuals suffer from heteroskedasticity and serial correlation, which render the standard errors from lm to be unreliable.

library(ecm)
library(sandwich)
library(lmtest)
library(urca)

#Use ecm to predict Wilshire 5000 index based on corporate profits, 
#Federal Reserve funds rate, and unemployment rate
data(Wilshire)

#Use 2015-12-01 and earlier data to build models
trn <- Wilshire[Wilshire$date<='2015-12-01',]

#Assume all predictors are needed in the equilibrium and transient terms of ecm
xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
model1 <- ecm(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE)

#Assume CorpProfits and FedFundsRate are in the equilibrium term, 
#UnempRate has only transient impacts
xeq <- trn[c('CorpProfits', 'FedFundsRate')]
xtr <- trn['UnempRate']
model2 <- ecm(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE)

summary(model1)
summary(model2)

mod1_unitroottest <- ur.df(model1$residuals, lags=10, type="none", selectlags="BIC")
mod2_unitroottest <- ur.df(model2$residuals, lags=10, type="none", selectlags="BIC")

summary(mod1_unitroottest) #reject unit root
summary(mod2_unitroottest) #reject unit root

#homoskedasticity of residuals
bptest(model1) #reject homoskedasticity
bptest(model2) #reject homoskedasticity

#no serial correlation of residuals
bgtest(model1) #reject no serial correlation 
bgtest(model2) #reject no serial correlation 

#NeweyWest p-values (to adjust for heteroskedasticity and serial correlation)
coeftest(model1, vcov=NeweyWest)
coeftest(model2, vcov=NeweyWest)

#yLag1 is not sig at the 10% alpha.
#this implies no co-integration
gaurbans commented 3 years ago

I've updated the examples to incorporate your suggestions. Thanks!