Closed alexpghayes closed 5 years ago
Notes to self for when I come back to this in a couple days:
car::linearHypothesis
crashes for factor variables with many levelsAlex, thanks for (1) the kind words, (2) letting us know we are compatible out of the box (kinda) with another package, and (3) the PR you want to submit.
What were you thinking? It'd be nice to add it to the man pages for lm_robust
and a vignette on comparability with Stata would be really great. We are also considering a vignette that shows what external packages we play nicely with besides just mentioning them in the Getting Started page.
As I look more into car::linearHypothesis
(the CRAN Econometrics Taskview makes me think this is a sort of standard) I think you may want to implement your own linearHypothesis
method for lm_robust
, as the default method has an argument white.adjust
which I believe recalculates a robust covariance matrix and might not play well with whatever y'all already implemented. I think the default method should work at the moment, that argument just might be confusing.
Similarly, the car
package also provides an Anova
method that I believe appropriately deals with heteroscedasticity. Some references:
lm_robust
objects do not seem to work out of the box with Anova
, but I think either an Anova
or anova
method would be useful as that's a standard way to compare models in R. So I guess this just became a feature request!
In terms of documentation, I think adding two short examples of using linearHypothesis
and anova
right after the current margins
example would make sense. I'm also realizing that I just don't know enough about how people use Stata to start a conversion guide (but I still would encourage y'all to make one when you have the bandwidth!).
Thanks for this extremely informative post. Has there been any updates to this with respect to a linearHypothesis
specifically for the lm_robust
library? I need to do the equivalent of a TukeyHSD with cluster-robust standard errors.
@alexpghayes, can you explain your strategy for "clustering based on multiple variables at once by creating a new column"
I don't know anything about Tukey-esque procedures for anything beyond basic ANOVA. By "clustering on multiple variables at once" I meant do something like:
library(estimatr)
library(dplyr)
df <- alo_star_men %>%
mutate(two_vars_at_once = paste0(gpa0, ssp))
model <- lm_robust(
GPA_year2 ~ gpa0 + ssp,
data = df,
cluster = two_vars_at_once,
se_type = "stata"
)
where you essentially cluster on an interaction. I have no idea at all if this is a reasonable thing to do. Might be worth looking at mixed models / GLS so you can directly specify the covariance structure you want.
Re: car::linearHypothesis
: I'm since had some additional exposure to car
and it's lost a fair amount of its allure. I can't find any tests for the methods it implements which makes me leery. I would do the test by hand if you only need to do it once, and if you need to do it more than once maybe code something up yourself.
If you have some vector $l$, then $l^T \hat \beta \sim N(l^T \hat \beta, l^T Cov(\hat \beta) l$, so you can do tests that way (i.e. use $l = (1, -1, 0, ...)$ to test $H_0: (1, -1, ...)^T (\beta_1, \beta_2, ...) = \beta_1 - \beta_2 = 0$).
@alexpghayes, thanks a bunch! That is enormously helpful. I've become pretty disappointed with the lack of post-estimation tools for cluster-robust regressions in R, and may need to move to Stata for a couple of my tests.
For future reference, I believe the gee
library can be used with lsmeans
for a complete suite of post-estimation tools. The felm
library also has a 'contrasts' option that can be used for that purpose.
My two cents:
linearHypothesis doesn't have a lot of the functionality of go-to packages like lsmeans (now esmeans), and I'd love to see a estimatr version of esmeans for robust post-estimation. There simply isn't a package that can suppress fixed effects (like lm_robust), have robust standard errors, while also being able to do important post-estimation like plotting interaction effects, generating margins, and Bonferroni-adjusted Tukey-esque procedures for comparing factors. I have lots of combinations of factors I want to test, and linear hypothesis has no straight-forward way of adjusting p-values for each test (unlike a library like multcomp
, which doesn't work with lm_robust
)
gee
can do clustered standard errors, but not suppressed fixed effects.
felm
can do clustered standard errors and suppressed fixed effects, but
has no predict method, so it can't generate interaction plots or margins.
On Fri, Dec 21, 2018 at 11:25 AM Luke Sonnet notifications@github.com wrote:
@alexpghayes https://github.com/alexpghayes , are you wary specifically of the linearHypothesis() function or the car package in general? The linearHypothesis usage and results seem pretty straightforward and match other implementations.
Just curious if you think something in estimatr itself would be more appropriate/trustworthy.
Thanks.
Have you tried using the margins
package with lm_robust
?
Last I checked we were compatible.
Yes. Margins work. So do interaction plots now with the incredible library sjPlot
, after I requested it here: https://github.com/strengejacke/sjPlot/issues/444.
The only thing that's lacking is a reliable post-estimation tool.
Yes, I think supporting a broad set of post-estimation tools would be great. I'd especially like to know which were most commonly used so that we could ensure that we support those. However I'm fairly certain that they won't be added in the near future as priorities are bugfixes, general maintenance, and then a more serious rewrite to improve estimation of degrees of freedom.
A first attempt at incorporating tests of linear hypotheses and the car framework in general has been merged in here #281! There are still questions regarding the correct degrees of freedom, but the right variance is being used at least. Check out ?lh_robust
in the master branch, soon to be on CRAN.
I'm closing this issue for now in favor of more specific issues.
Nice! Can't wait to try this out!
I've been fiddling around with the linearHypothesis command, and I'm still not sure how to do basic contrasts.
Let's say I have a specification of the following form:
lm(outcome ~ politics + sex)
where politics can take either the value "D" or "R", and sex can take either "F" or "M." How would I specify, using linear_hypothesis, to test the significance of all pairwise differences between the combinations of politics and sex?
So, for example, DF vs (DM, RF, RM) and DM vs (RF, RM), and so on. This is straightforward in lsmeans
using contrasts
, but I cannot figure out how to do it with the cars
library.
Thank you so much for
estimatr
-- I've managed to get through most of an econometrics class without having to break out Stata, for which I'm infinitely grateful.In the class, we basically end up using the following:
The
estimatr
document forlm_robust
is very thoroughly and useful, but I had to search around a little bit before realizing that I could test multiple coefficients simultaneously withcar::linearHypothesis
. For example:I know that
car::linearHypothesis
isn't really a part of theestimatr
package, but I'd like to make a PR documenting a use case like above. I thinkestimatr
is the current go to drop-in replacement for Stata in R, and this would result in a single document with most of the documentation necessary to get through many first courses in econometrics. If you're okay with this, I'll write up an example and make a PR in the next couple days.Alternatively, I think a more detailed Stata / R conversion document centered on
estimatr
would see a ton of use. Since I'm not a Stata user I could really only write up basic examples, but I'd also be willing to start the ball rolling on that if there's broader interest.