Closed DeFilippis closed 5 years ago
IIRC, estimatr will drop the left-most duplicates in the design matrix - which is different from lm
, which uses a non-standard implementation to drop the right-most.
I think that the dropped coefficients should show up as NA, though.
It appears you're right, as the model runs even if I do try_cholesky = TRUE
, implying lm_robust notices no singularity issues.
This is strange, because if I run analogous models in different libraries, it will throw errors with cholesky decomposition. See the following:
GEE library:
gee(outgroup_feelings_diff ~ treatment, id = teamID, data = data)
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) treatmentDD treatmentDR treatmentRD
0.7220000 -0.1572857 5.4177500 6.0612500
Error in gee(outgroup_feelings_diff ~ treatment, id = teamID :
NA/NaN/Inf in foreign function call (arg 3)
In addition: Warning message:
In gee(outgroup_feelings_diff ~ treatment, id = teamID, :
NAs introduced by coercion
FELM library:
felm(outgroup_feelings_diff ~ treatment | teamId | 0 | 0, data = data)
treatmentDD treatmentDR treatmentRD
NaN -0.6435 NaN
Warning message:
In chol.default(mat, pivot = TRUE, tol = tol) :
the matrix is either rank-deficient or indefinite
LM_ROBUST
lm_robust(outgroup_feelings_diff ~ treatment,
cluster = teamID,
try_cholesky = TRUE,
data = data)
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.7220000 0.8047668 0.8971543 0.375141611 -0.9057945 2.349795 39.00000
treatmentDD -0.1572857 1.2454367 -0.1262896 0.899856236 -2.6402053 2.325634 71.68429
treatmentDR 5.4177500 1.7384225 3.1164749 0.002651691 1.9507494 8.884751 70.20000
treatmentRD 6.0612500 2.6218889 2.3117875 0.023728531 0.8323194 11.290181 70.20000
What is it about lm_robust that is allowing it to do cholesky decomposition without dropping any columns, whereas other libraries cannot?
@nfultz
I think that may be because try_cholesky
will fall back to QR when the
cholesky decomp fails:
Ah. Good point. Doesn't it seem like the regression should output that cholesky failed before going to QR so that the user is aware that they have a singular matrix? I would have never known unless I tried with two other libraries.
In any case, shouldn't QR still be dropping one of the columns (the left-most?) . On the output above you can see that no columns are dropped. How can it get standard errors for each of the estimates if the matrix is singular?
Thanks for bringing this example to our attention. I'd sooner remove try_cholesky
than have it warn, to be honest. It's not something I really think we should support. As the argument name suggests, it's just going to try to speed things up with a cholesky decomposition.
However, as for the QR decomposition yeilding estimates when other methods do not, it's a bit hard for me to diagnose without a replicable example. Are you able to share the data or some simulated or subsample of data that replicates the problem?
e: I should add that in many cases the QR decomposition from Eigen that we rely on does drop linearly dependent columns in toy examples and in some less trivial examples I've encountered in data analysis. This shows up as a message and a row of NAs in the resulting output, as in lm()
. So to be clear the answer to your initial issue question is "lm_robust does print the dropped, linearly dependent columns of the design matrix."
And I'm asking for your data to diagnose why in your case the QR decomposition is not dropping any columns.
Just wanted to clarify.
I'm trying to figure out how to simulate the problem, but the issue is I'm not sure what is actually causing it! Will continue to dig around and submit something soon.
I'm going to close this unless there's a workable example that can be posted, at which time I'll reopen. I think it's buyer beware with the non-standard decomposition, and we aren't returning "incorrect" results, just perhaps unexpected results.
I am running a model that fails in
felm
,gee
andlme4
due to matrix singularities. I only discovered this today, because I've been successfully running my models withlm_robust
for the last several months. I discovered in the documentation thatlm_robust
deals with rank deficiencies in the following way: "when X is rank deficient, there are certain conditions under which the QR factorization algorithm we use, from the Eigen C++ library, drops different coefficients from the output than the default lm function."Is there any way to print which coefficients are linearly dependent, and which are dropped in the analysis?