MartinSpindler / hdm

Other
11 stars 8 forks source link

multicollinearity #17

Open MartinSpindler opened 7 months ago

MartinSpindler commented 7 months ago
Multicollinearity in rlassoEffect: Consider Y = a*X + 0*Z + e, where X and Z are collinear.  It seems in rlassoEffect, \tilde{X} is not exactly zero, which causes large numerical instability in the coefficient estimate for a (will be very large).  Would it help if the package can detect multicollinearity in the original model before doing double lasso?  Alternatively, we can remove the multicollinearities first in [X,Z].

How this came up is for the Penn reemployment analysis, once we include three-way interactions, rlassoEffect has a perfect fit for T4 (treatment) due to multicollinearities. The residualized T4 is so close to zero that the double lasso estimate blows up:

and if we remove the multicollinearities before double lasso, then the estimate is reasonable again:

Thank you very much -- this is an awesome observation. We need to think about how to “auto-detect” and remove this problem.