Open mlondschien opened 1 month ago
I agree that these statistics are only sufficient for identification in the case of a single variable. They are still necessary when you have multiple variables, so not useless.
I suppose the case of colinear fitted values of endogenous variables falls closer to the the weak IV area, something that I havn't tried to include in this package. The challenge with incorporating the Stock and Yogo test is that it is difficult to use since there are some key tuning parameters that one has to choose when selecting the critical value.
If there are multiple endogenous variables,
IV2SLS.first_stage
reports the F-statistics when regressing each component of the endogenous variables on the instruments (and controls). This is misleading. If the endogenous variables are correlated, the individual F-statistics can be large, while the causal parameter is not well identified.See the following example:
The individual F-statistics are large, suggesting that Wald-based confidence sets can be trusted. They cannot.
Even though the true parameter is zero, the F-statistic is highly significant at ~30. So are the t-statistics.
In Testing for Weak Instruments in Linear IV Regression (2005), Stock and Yogo suggest to use the Cragg and Donald statistic for reduced rank to test for identification. If $P_Z$ is the projection onto the column span of $Z$, and $MZ$ the projection onto the orthogonal column span, the statistic is $n \cdot \lambda\mathrm{min}\left( (X^T M_Z X)^{-1} X^T P_Z X \right).$ In the case of a single endogenous variable, this is the F-statistic. Else, it takes the correlation of the columns of $\Pi$ in $X = Z \Pi + V$ into account. In Table 1, they report thresholds for the statistic, similarly to the first-stage F-test heuristic based on Staiger and Stock (1997).
In the example above, the Cragg and Donald test statistic is very small, correctly suggesting that Wald-based inference cannot be trusted.