DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

Return first-stage in iv_robust #335

Open grantmcdermott opened 5 years ago

grantmcdermott commented 5 years ago

It's common to see the coefficients from the first-stage regression regression in a 2SLS regression table. For example, see discussion here.

estimatr::iv_robust does not currently support this AFAIK. (Although it does return some overall diagnostic results from the first-stage if the "diagnostics = T" argument is used.) Would it be possible add the first-stage to the model return object?

FWIW, lfe::felm supports this with a "stage1" return object. Here's a reprex:

# library(AER) ## only for Cigarettes dataset
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(lfe))

## Get the data
data("CigarettesSW", package = "AER")
## Create a new data frame with some modified variables
cigs <-
  CigarettesSW %>%
  mutate(
    rprice = price/cpi,
    rincome = income/population/cpi,
    rtax = tax/cpi,
    tdiff = (taxs - tax)/cpi
  ) 

## Run the iv regression in felm with tdiff and rtax instrumenting the endogenous
## variable log(rprice)
iv_felm <- 
  felm(
    log(packs) ~ log(rincome) |
      year + state | ## FEs
      (log(rprice) ~ tdiff + rtax), ## Endog. variable and instruments
    data = cigs
  )

## Shown first stage result
summary(iv_felm$stage1)
#> 
#> Call:
#>    NULL 
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.06233 -0.01529  0.00000  0.01529  0.06233 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> log(rincome) -0.028994   0.147492  -0.197    0.845    
#> tdiff         0.013457   0.003050   4.412 6.52e-05 ***
#> rtax          0.007573   0.001049   7.221 5.43e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.03064 on 44 degrees of freedom
#> Multiple R-squared(full model): 0.9815   Adjusted R-squared: 0.9601 
#> Multiple R-squared(proj model): 0.7779   Adjusted R-squared: 0.5204 
#> F-statistic(full model):45.85 on 51 and 44 DF, p-value: < 2.2e-16 
#> F-statistic(proj model): 51.36 on 3 and 44 DF, p-value: 2.015e-14 
#> F-statistic(excl instr.):75.65 on 2 and 44 DF, p-value: 5.758e-15

Created on 2019-11-07 by the reprex package (v0.3.0)

acoppock commented 4 years ago

Thanks very much for this. I can see a nice argument for returning the reduced form and first stage regressions as additional entries in the iv_robust object. Might be cool if then we could have

tidy(iv_robust_fit, model = "first_stage")
tidy(iv_robust_fit, model = "reduced_form")
tidy(iv_robust_fit, model = "second_stage")

or similar?

grantmcdermott commented 4 years ago

@acoppock That looks great to me.

nfultz commented 4 years ago

@acoppock I would recommend just adding the first stage as a named element on the iv_robust_fit object, and not directing people to specify non-standard options on tidy / not also supporting summary.

Here is what lfe does for tidy:

> tidy(iv_felm)
# A tibble: 2 x 5
  term               estimate std.error statistic       p.value
  <chr>                 <dbl>     <dbl>     <dbl>         <dbl>
1 log(rincome)          0.462     0.308      1.50 0.141        
2 `log(rprice)(fit)`   -1.20      0.171     -7.02 0.00000000940
> tidy(iv_felm$stage1)
# A tibble: 3 x 5
  term         estimate std.error statistic       p.value
  <chr>           <dbl>     <dbl>     <dbl>         <dbl>
1 log(rincome) -0.0290    0.147      -0.197 0.845        
2 tdiff         0.0135    0.00305     4.41  0.0000652    
3 rtax          0.00757   0.00105     7.22  0.00000000543
acoppock commented 4 years ago

Thanks Neal, that's really helpful. Nice way to do both.

floswald commented 4 years ago

can i upvote that issue? :-) thanks

raffaem commented 1 year ago

Any news on this?