Add documentation demonstrating omnibus hypothesis tests

alexpghayes commented 6 years ago

Thank you so much for estimatr -- I've managed to get through most of an econometrics class without having to break out Stata, for which I'm infinitely grateful.

In the class, we basically end up using the following:

linear models with (potentially clustered or robust) standard errors
omnibus hypothesis tests on multiple coefficients at once

The estimatr document for lm_robust is very thoroughly and useful, but I had to search around a little bit before realizing that I could test multiple coefficients simultaneously with car::linearHypothesis. For example:

library(estimatr)
library(car)

model <- lm_robust(GPA_year2 ~ gpa0 + ssp, data = alo_star_men,
                   se_type = "stata")
linearHypothesis(model, c("gpa0 = ssp", "ssp = 0"))
#> Linear hypothesis test
#> 
#> Hypothesis:
#> gpa0 - ssp = 0
#> ssp = 0
#> 
#> Model 1: restricted model
#> Model 2: GPA_year2 ~ gpa0 + ssp
#> 
#>   Res.Df Df Chisq Pr(>Chisq)    
#> 1    139                        
#> 2    137  2 14.21  0.0008208 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I know that car::linearHypothesis isn't really a part of the estimatr package, but I'd like to make a PR documenting a use case like above. I think estimatr is the current go to drop-in replacement for Stata in R, and this would result in a single document with most of the documentation necessary to get through many first courses in econometrics. If you're okay with this, I'll write up an example and make a PR in the next couple days.

Alternatively, I think a more detailed Stata / R conversion document centered on estimatr would see a ton of use. Since I'm not a Stata user I could really only write up basic examples, but I'd also be willing to start the ball rolling on that if there's broader interest.

alexpghayes commented 6 years ago

Notes to self for when I come back to this in a couple days:

car::linearHypothesis crashes for factor variables with many levels
provide an example of clustering based on multiple variables at once by creating a new column

lukesonnet commented 6 years ago

Alex, thanks for (1) the kind words, (2) letting us know we are compatible out of the box (kinda) with another package, and (3) the PR you want to submit.

What were you thinking? It'd be nice to add it to the man pages for lm_robust and a vignette on comparability with Stata would be really great. We are also considering a vignette that shows what external packages we play nicely with besides just mentioning them in the Getting Started page.

alexpghayes commented 6 years ago

As I look more into car::linearHypothesis (the CRAN Econometrics Taskview makes me think this is a sort of standard) I think you may want to implement your own linearHypothesis method for lm_robust, as the default method has an argument white.adjust which I believe recalculates a robust covariance matrix and might not play well with whatever y'all already implemented. I think the default method should work at the moment, that argument just might be confusing.

Similarly, the car package also provides an Anova method that I believe appropriately deals with heteroscedasticity. Some references:

lm_robust objects do not seem to work out of the box with Anova, but I think either an Anova or anova method would be useful as that's a standard way to compare models in R. So I guess this just became a feature request!

In terms of documentation, I think adding two short examples of using linearHypothesis and anova right after the current margins example would make sense. I'm also realizing that I just don't know enough about how people use Stata to start a conversion guide (but I still would encourage y'all to make one when you have the bandwidth!).

DeFilippis commented 5 years ago

Thanks for this extremely informative post. Has there been any updates to this with respect to a linearHypothesis specifically for the lm_robust library? I need to do the equivalent of a TukeyHSD with cluster-robust standard errors.

DeFilippis commented 5 years ago

@alexpghayes, can you explain your strategy for "clustering based on multiple variables at once by creating a new column"

alexpghayes commented 5 years ago

I don't know anything about Tukey-esque procedures for anything beyond basic ANOVA. By "clustering on multiple variables at once" I meant do something like:

library(estimatr)
library(dplyr)

df <- alo_star_men %>% 
  mutate(two_vars_at_once = paste0(gpa0, ssp))

model <- lm_robust(
  GPA_year2 ~ gpa0 + ssp,
  data = df,
  cluster = two_vars_at_once,
  se_type = "stata"
)

where you essentially cluster on an interaction. I have no idea at all if this is a reasonable thing to do. Might be worth looking at mixed models / GLS so you can directly specify the covariance structure you want.

Re: car::linearHypothesis: I'm since had some additional exposure to car and it's lost a fair amount of its allure. I can't find any tests for the methods it implements which makes me leery. I would do the test by hand if you only need to do it once, and if you need to do it more than once maybe code something up yourself.

If you have some vector $l$, then $l^T \hat \beta \sim N(l^T \hat \beta, l^T Cov(\hat \beta) l$, so you can do tests that way (i.e. use $l = (1, -1, 0, ...)$ to test $H_0: (1, -1, ...)^T (\beta_1, \beta_2, ...) = \beta_1 - \beta_2 = 0$).

DeFilippis commented 5 years ago

@alexpghayes, thanks a bunch! That is enormously helpful. I've become pretty disappointed with the lack of post-estimation tools for cluster-robust regressions in R, and may need to move to Stata for a couple of my tests.

For future reference, I believe the gee library can be used with lsmeans for a complete suite of post-estimation tools. The felm library also has a 'contrasts' option that can be used for that purpose.

DeFilippis commented 5 years ago

My two cents:

linearHypothesis doesn't have a lot of the functionality of go-to packages like lsmeans (now esmeans), and I'd love to see a estimatr version of esmeans for robust post-estimation. There simply isn't a package that can suppress fixed effects (like lm_robust), have robust standard errors, while also being able to do important post-estimation like plotting interaction effects, generating margins, and Bonferroni-adjusted Tukey-esque procedures for comparing factors. I have lots of combinations of factors I want to test, and linear hypothesis has no straight-forward way of adjusting p-values for each test (unlike a library like multcomp, which doesn't work with lm_robust)

gee can do clustered standard errors, but not suppressed fixed effects. felm can do clustered standard errors and suppressed fixed effects, but has no predict method, so it can't generate interaction plots or margins.

On Fri, Dec 21, 2018 at 11:25 AM Luke Sonnet notifications@github.com wrote:

@alexpghayes https://github.com/alexpghayes , are you wary specifically of the linearHypothesis() function or the car package in general? The linearHypothesis usage and results seem pretty straightforward and match other implementations.

Just curious if you think something in estimatr itself would be more appropriate/trustworthy.

lukesonnet commented 5 years ago

Thanks.

Have you tried using the margins package with lm_robust?

Last I checked we were compatible.

DeFilippis commented 5 years ago

Yes. Margins work. So do interaction plots now with the incredible library sjPlot, after I requested it here: https://github.com/strengejacke/sjPlot/issues/444.

The only thing that's lacking is a reliable post-estimation tool.

lukesonnet commented 5 years ago

Yes, I think supporting a broad set of post-estimation tools would be great. I'd especially like to know which were most commonly used so that we could ensure that we support those. However I'm fairly certain that they won't be added in the near future as priorities are bugfixes, general maintenance, and then a more serious rewrite to improve estimation of degrees of freedom.

lukesonnet commented 5 years ago

A first attempt at incorporating tests of linear hypotheses and the car framework in general has been merged in here #281! There are still questions regarding the correct degrees of freedom, but the right variance is being used at least. Check out ?lh_robust in the master branch, soon to be on CRAN.

I'm closing this issue for now in favor of more specific issues.

DeFilippis commented 5 years ago

Nice! Can't wait to try this out!

DeFilippis commented 5 years ago

I've been fiddling around with the linearHypothesis command, and I'm still not sure how to do basic contrasts.

Let's say I have a specification of the following form:

lm(outcome ~ politics + sex) where politics can take either the value "D" or "R", and sex can take either "F" or "M." How would I specify, using linear_hypothesis, to test the significance of all pairwise differences between the combinations of politics and sex?

So, for example, DF vs (DM, RF, RM) and DM vs (RF, RM), and so on. This is straightforward in lsmeans using contrasts, but I cannot figure out how to do it with the cars library.

DeclareDesign / estimatr

Add documentation demonstrating omnibus hypothesis tests #198