bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
946 stars 184 forks source link

IV2SLS `first_stage` reports "wrong" first stage F-statistic #622

Open mlondschien opened 1 month ago

mlondschien commented 1 month ago

If there are multiple endogenous variables, IV2SLS.first_stage reports the F-statistics when regressing each component of the endogenous variables on the instruments (and controls). This is misleading. If the endogenous variables are correlated, the individual F-statistics can be large, while the causal parameter is not well identified.

See the following example:

In [1]: from linearmodels.iv import IV2SLS
   ...: import numpy as np
   ...: 
   ...: rng = np.random.default_rng(0)
   ...: 
   ...: n = 1000
   ...: 
   ...: Z = rng.normal(size=(n, 3))
   ...: 
   ...: H = rng.normal(size=(n, 3))  # confounder
   ...: X = Z @ np.ones((3, 2)) + H @ np.array([[1, 0], [0, -1], [0, 0]])
   ...: y = H @ np.array([1, 1, 0.1])  # beta = 0
   ...: 
   ...: tsls = IV2SLS(y, None, X, Z).fit(cov_type="unadjusted")
   ...: print(tsls.first_stage)
         First Stage Estimation Results         
================================================
                              endog.0    endog.1
------------------------------------------------
R-squared                      0.7256     0.7364
Partial R-squared              0.7256     0.7364
Shea's R-squared               0.0010     0.0010
Partial F-statistic            881.59     931.31
P-value (Partial F-stat)     1.11e-16   1.11e-16
Partial F-stat Distn         F(3,997)   F(3,997)
========================== ========== ==========
instruments.0                  1.0079     0.9792
                             (31.124)   (31.102)
instruments.1                  0.9454     0.9773
                             (28.305)   (30.095)
instruments.2                  0.9673     0.9655
                             (30.639)   (31.453)
------------------------------------------------

T-stats reported in parentheses
T-stats use same covariance type as original model

The individual F-statistics are large, suggesting that Wald-based confidence sets can be trusted. They cannot.

In [2]: tsls
Out[2]: 
                          IV-2SLS Estimation Summary                          
==============================================================================
Dep. Variable:              dependent   R-squared:                      0.9775
Estimator:                    IV-2SLS   Adj. R-squared:                 0.9775
No. Observations:                1000   F-statistic:                    29.778
Date:                Fri, Oct 04 2024   P-value (F-stat)                0.0000
Time:                        16:48:29   Distribution:                  chi2(2)
Cov. Estimator:            unadjusted                                         

                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
endog.0        0.8687     0.1592     5.4558     0.0000      0.5566      1.1807
endog.1       -0.8694     0.1593    -5.4568     0.0000     -1.1817     -0.5571
==============================================================================

Endogenous: endog.0, endog.1
Instruments: instruments.0, instruments.1, instruments.2
Unadjusted Covariance (Homoskedastic)
Debiased: False
IVResults, id: 0x15b26a510

Even though the true parameter is zero, the F-statistic is highly significant at ~30. So are the t-statistics.

In Testing for Weak Instruments in Linear IV Regression (2005), Stock and Yogo suggest to use the Cragg and Donald statistic for reduced rank to test for identification. If $P_Z$ is the projection onto the column span of $Z$, and $MZ$ the projection onto the orthogonal column span, the statistic is $n \cdot \lambda\mathrm{min}\left( (X^T M_Z X)^{-1} X^T P_Z X \right).$ In the case of a single endogenous variable, this is the F-statistic. Else, it takes the correlation of the columns of $\Pi$ in $X = Z \Pi + V$ into account. In Table 1, they report thresholds for the statistic, similarly to the first-stage F-test heuristic based on Staiger and Stock (1997).

In the example above, the Cragg and Donald test statistic is very small, correctly suggesting that Wald-based inference cannot be trusted.

In [3]: from ivmodels.tests import rank_test
   ...:
   ...: statistic, p_value = rank_test(Z, X, fit_intercept=False)
   ...: print(f"{statistic=}, {p_value=}")
statistic=np.float64(0.8939161043879634), p_value=np.float64(0.6395707363012899)
bashtage commented 1 month ago

I agree that these statistics are only sufficient for identification in the case of a single variable. They are still necessary when you have multiple variables, so not useless.

I suppose the case of colinear fitted values of endogenous variables falls closer to the the weak IV area, something that I havn't tried to include in this package. The challenge with incorporating the Stock and Yogo test is that it is difficult to use since there are some key tuning parameters that one has to choose when selecting the critical value.