DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

better docs for na.omit behavior with multiple variables #299

Closed graemeblair closed 5 years ago

graemeblair commented 5 years ago

When you do lm_robust(cbind(Y1, Y2, Y3) ~ x, data = data) vs lm_robust(Y1 ~ x, data = data) when there are missing values in Y2 and Y3 that are not missing in Y1, you get different answers for effect on Y1. This is also the lm behavior. It's not necessarily what you would expect so suggest a line in the docs on this.

lukesonnet commented 5 years ago

I will add something to the docs about this. However I do think this should be the expected behavior as you're just doing (X'X)^-1X'Y

graemeblair commented 5 years ago

Thanks. That seems clear behind the scenes, but not necessarily to the user (plausible it could just be doing three regressions and stacking the results).

acoppock commented 5 years ago

seems easy enough to implement

lukesonnet commented 5 years ago

I don't really know the right place to put this, but I added it to lm_robust(). I think people doing multivariate regression should probably know this anyways but if you think it needs to be in a more visible place please advise.