Closed matdehaven closed 4 months ago
We should include a constant since the variables in y_t are not demeaned.
Harvey-EstimatingRegressionModels-1976.pdf shows that the intercept (the constant term) in the heteroskedasticity regression is estimated inconsistently, but that it is easy to correct by adding a known constant number. We should add that correction once we add the intercept term to the heteroskedasticity regression.
The paper also shows how to test for the null hypothesis that all coefficients in the heteroskedasticity regression, except for the intercept, are jointly zero (i.e., the null of homoskedasticity).
PS. Looks like to estimate the heteroskedasticity regression we are currently using the lm()
command in:
https://github.com/VFCI/vfciBusinessCycles/blob/main/code/vfciBCHelpers/R/fit_het_reg_from_var.r
In the original VFCI paper, instead of lm()
, we were using the gls()
command, which I implemented through the het_reg()
function in:
code/clean-raw-data/vfci-data-helpers/hetreg.R
hetreg(data, x, y)
replicates the command hetregress in Stata:
hetregress x y, het(y)
but I am not sure if hetregress
in Stata and gls
in R make the correction for the intercept term (but given results between the two agree, they both do the same thing, whatever that is).
What I have found so far:
Our R function hetreg(data, y, x, het = x2, method = "ML")
which uses gls()
with weights replicates the STATA command hegress x, y, het(x2)
.
The twostep
option has STATA use the two step GLS procedure from Harvey (1976) rather than the Maximum Likelihood method that is also outlined in that paper. STATA defaults to the ML method.
We have to add method = "ML"
to the R call as by default gls()
will use "REML", Restricted Maximum Likelihood. Using REML gives slightly different estimates.
So STATA does adjust the constant in step (3). Since our R hatreg()
function matches it, I assume gls()
also adjusts correctly.
Notice that is step (5) they refit the original regression model as well, so estimates of Beta_hat are different than the OLS estimates from step (1). This means that the VAR coefficients are different than the hetreg()
first stage coefficients. I am not sure if this is a problem or not for our instrument method.
In testing these functions, I loaded up our data into STATA ran the hetregress
options there for the unemployment regression. STATA immediately reports the test we want:
LR test of lnsigma2=0: chi2(10) = 79.61 Prob > chi2 = 0.0000
So there is no question whether this estimation passes this basic test. It will be more interesting to compare with the other variables (like output) where it isn't as visually obvious that multiplicative heteroskedasticity exists.
I will try to implement this same test in R (or find the right function).
-[] Implement heteroskedasticity null hypothesis test in R
Here is a quick Look at the impact of changing the heteroskedasticity estimation:
Both are internal estimates of the VFCI using heteroskedasiticity of unemployment. "hetreg_vfci" in the top panel uses hetreg(data, "unemployment", lagged_y, het = y, method = "ML")
. "lm_vfci" takes the residuals from the VAR, and then runs the second stage estimation using those and lm()
. The panel on the right shows a scatter between the two series.
From our conversation:
method = "REML"
for REstricted Maximum Likelihood when running the hetreg()
command in R, which corrects for degrees of freedom (or something similar).Implemented with functions hetreg()
for ML method and hetreg_twostep()
for twostep. Can use id_linear_het_reg()
to identify a VAR using either the ML or two-step method. Defaults to two-step, but pdf reports created for both.
Should we use a constant when estimating the hetoroskedasticity with a linear regression?
I have implemented it so I can switch between using one or not. It does affect our identification of the VAR.