bashtage / arch

ARCH models in Python
Other
1.32k stars 247 forks source link

estimating a GARCH with independent variables #471

Closed randomgambit closed 3 years ago

randomgambit commented 3 years ago

Hello @bashtage,

Thanks for this really amazing package! This is extremely useful.

I was trying to fit a simple model where I specify an ad-hoc linear model for the mean with the idea of using a GARCH for the variance process. However, I was unable to make it work correctly.

In the example below, I regress inflation on lagged inflation and its square. I thought I would use the function LS from mean.models but I get an error message.

What is the issue there? Thanks!


import pandas as pd
import numpy as np
import arch.data.core_cpi
from arch.univariate import LS

core_cpi = arch.data.core_cpi.load()

#creating the dependent variable
Y_inflation = 100 * core_cpi.CPILFESL.pct_change(12).dropna()

#creating the two simple independent variables
X_inflation_lag = Y_inflation.shift(1)
X_inflation_lag.name = 'lag_inflation'

X_inflation_square = X_inflation_lag**2
X_inflation_square.name = 'lag_inflation_square'

#combining them into a dataframe, as specified in the docs
indep = pd.concat([X_inflation_lag, X_inflation_square], axis = 1)

ar = LS(Y_inflation, indep)

print(ar.fit().summary())
LinAlgError: SVD did not converge
bashtage commented 3 years ago

You probably have missing values. You need to dropna and make sure the y and x have the same size.

randomgambit commented 3 years ago

thank you so much! I have put all the data into a dataframe and removed the missing values. This works great! Unfortunately, now I am getting another problem: my forecast vector is totally empty. Is this expected?

import pandas as pd
import numpy as np
import arch.data.core_cpi
from arch.univariate import LS
from arch import arch_model

core_cpi = arch.data.core_cpi.load()

Y_inflation = 100 * core_cpi.CPILFESL.pct_change(12).dropna()

df = pd.DataFrame(Y_inflation)

df['lag_inflation'] = df.CPILFESL.shift(1)
df['lag_inflation_square'] = df.lag_inflation ** 2

df.dropna(inplace = True)

ar = LS(df.CPILFESL, df[['lag_inflation', 'lag_inflation_square']])
print(ar.fit().summary())
print(ar.fit().summary())

which gives

               Least Squares - Constant Variance Model Results                
==============================================================================
Dep. Variable:               CPILFESL   R-squared:                       0.990
Mean Model:             Least Squares   Adj. R-squared:                  0.990
Vol Model:          Constant Variance   Log-Likelihood:               -40.8812
Distribution:                  Normal   AIC:                           89.7624
Method:            Maximum Likelihood   BIC:                           108.135
                                        No. Observations:                  730
Date:                Thu, Mar 11 2021   Df Residuals:                      727
Time:                        17:24:21   Df Model:                            3
                                       Mean Model                                       
========================================================================================
                            coef    std err          t      P>|t|       95.0% Conf. Int.
----------------------------------------------------------------------------------------
Const                 2.6414e-03  3.410e-02  7.746e-02      0.938 [-6.419e-02,6.948e-02]
lag_inflation             1.0027  1.836e-02     54.618      0.000      [  0.967,  1.039]
lag_inflation_square -6.5760e-04  1.827e-03     -0.360      0.719 [-4.238e-03,2.922e-03]
                              Volatility Model                              
============================================================================
                 coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
sigma2         0.0655  6.742e-03      9.714  2.621e-22 [5.228e-02,7.870e-02]
============================================================================

Covariance estimator: White's Heteroskedasticity Consistent Estimator
C:\Users\hedi\anaconda3\lib\site-packages\arch\univariate\base.py:311: DataScaleWarning: y is poorly scaled, which may affect convergence of the optimizer when
estimating the model parameters. The scale of y is 0.06549. Parameter
estimation work better when this value is between 1 and 1000. The recommended
rescaling is 10 * y.

This warning can be disabled by either rescaling y before initializing the
model or by setting rescale=False.

and running the simple forecast returns an empty vector.

myfit = ar.fit()
forecasts = myfit.forecast()
forecasts.mean

forecasts.mean
Out[63]: 
            h.1
Date           
1958-02-01  NaN
1958-03-01  NaN
1958-04-01  NaN
1958-05-01  NaN
1958-06-01  NaN
        ...
2018-07-01  NaN
2018-08-01  NaN
2018-09-01  NaN
2018-10-01  NaN
2018-11-01  NaN

[730 rows x 1 columns]
bashtage commented 3 years ago

You can't forecast from models that have exogenous regressors in the current version of arch. You can forecast from AR, HAR, Constant mean and zero mean models.

randomgambit commented 3 years ago

You can't forecast from models that have exogenous regressors in the current version of arch. You can forecast from AR, HAR, Constant mean and zero mean models.

thank you. Out of curiosity, is it due to the lack of a closed form formula for the forecast? Is there a more technical reason?

bashtage commented 3 years ago

The reason is the Et[Y{t+1}] depends on Et[X{t+1}] when the model has cross-sectional regressors, and I don't have a good way to pass in the required Et[X{t+1}] yet.

bashtage commented 3 years ago

Duplicates #425 and #435.

randomgambit commented 3 years ago

got it, thanks @bashtage ! I hope this is an easy fix...

randomgambit commented 3 years ago

relatedly, does arch include tools for forecast evaluation (rmse, mape, theil, etc)? Or this is left to the reader ;-)

bashtage commented 3 years ago

There is nothing for forecast eval. These are mostly so simple that it isn't worth the effort to document functions.

The forecasting with exogenous variables has now been fixed and will be in the next release.