Use constant in het reg?

matdehaven commented 5 months ago

Should we use a constant when estimating the hetoroskedasticity with a linear regression?

I have implemented it so I can switch between using one or not. It does affect our identification of the VAR.

matdehaven commented 5 months ago

We should include a constant since the variables in y_t are not demeaned.

fernando-duarte commented 5 months ago

Harvey-EstimatingRegressionModels-1976.pdf shows that the intercept (the constant term) in the heteroskedasticity regression is estimated inconsistently, but that it is easy to correct by adding a known constant number. We should add that correction once we add the intercept term to the heteroskedasticity regression.

The paper also shows how to test for the null hypothesis that all coefficients in the heteroskedasticity regression, except for the intercept, are jointly zero (i.e., the null of homoskedasticity).

PS. Looks like to estimate the heteroskedasticity regression we are currently using the lm() command in: https://github.com/VFCI/vfciBusinessCycles/blob/main/code/vfciBCHelpers/R/fit_het_reg_from_var.r

In the original VFCI paper, instead of lm(), we were using the gls() command, which I implemented through the het_reg() function in: code/clean-raw-data/vfci-data-helpers/hetreg.R

hetreg(data, x, y) replicates the command hetregress in Stata:

hetregress x y, het(y)

but I am not sure if hetregress in Stata and gls in R make the correction for the intercept term (but given results between the two agree, they both do the same thing, whatever that is).

matdehaven commented 5 months ago

What I have found so far:

What is Equivalent

Our R function hetreg(data, y, x, het = x2, method = "ML") which uses gls() with weights replicates the STATA command hegress x, y, het(x2).

The twostep option has STATA use the two step GLS procedure from Harvey (1976) rather than the Maximum Likelihood method that is also outlined in that paper. STATA defaults to the ML method.

We have to add method = "ML" to the R call as by default gls() will use "REML", Restricted Maximum Likelihood. Using REML gives slightly different estimates.

STATA Two Step Documentation (page 14)

So STATA does adjust the constant in step (3). Since our R hatreg() function matches it, I assume gls() also adjusts correctly.

Notice that is step (5) they refit the original regression model as well, so estimates of Beta_hat are different than the OLS estimates from step (1). This means that the VAR coefficients are different than the hetreg() first stage coefficients. I am not sure if this is a problem or not for our instrument method.

Testing the Null Hypothesis that there is no heteroskedasticity

In testing these functions, I loaded up our data into STATA ran the hetregress options there for the unemployment regression. STATA immediately reports the test we want:

LR test of lnsigma2=0: chi2(10) = 79.61                   Prob > chi2 = 0.0000

So there is no question whether this estimation passes this basic test. It will be more interesting to compare with the other variables (like output) where it isn't as visually obvious that multiplicative heteroskedasticity exists.

I will try to implement this same test in R (or find the right function).

-[] Implement heteroskedasticity null hypothesis test in R

What difference does this make?

Here is a quick Look at the impact of changing the heteroskedasticity estimation:

Both are internal estimates of the VFCI using heteroskedasiticity of unemployment. "hetreg_vfci" in the top panel uses hetreg(data, "unemployment", lagged_y, het = y, method = "ML"). "lm_vfci" takes the residuals from the VAR, and then runs the second stage estimation using those and lm(). The panel on the right shows a scatter between the two series.

matdehaven commented 5 months ago

From our conversation:

Should use the default option method = "REML" for REstricted Maximum Likelihood when running the hetreg() command in R, which corrects for degrees of freedom (or something similar).
We will default to the two step process for identifying the internal VFCI to the VAR as it is (1) how we explain the method in the paper, and (2) more consistent with how the VAR is estimated (ie OLS), and (3) the benefits of the the ML method (smaller variances) are outweighed by its sensitivity to correct specification.
We will use the ML method as a robustness appendix.

matdehaven commented 4 months ago

Implemented with functions hetreg() for ML method and hetreg_twostep() for twostep. Can use id_linear_het_reg() to identify a VAR using either the ML or two-step method. Defaults to two-step, but pdf reports created for both.

VFCI / vfciBusinessCycles