Am I misunderstanding the problem or the dataset? Or is there something missing?
Edit: in addition, part (b) states
Now use a regression analysis to estimate the causal effect from Dehejia and Wahba’s subset
of the constructed observational study.
However, their subsetting sounds like it was based around excluding subjects with missing covariates ("The subsample they chose removes men for whom only one pre-treatment measure of earnings is observed"). However, there doesn't seem to be covariate missingness in the data you provided:
> sapply(lalonde,function(x) mean(is.na(x)))
age educ black married nodegree re74 re75 re78
0 0 0 0 0 0 0 0
hisp sample treat educ_cat4
0 0 0 0
PS It is so cool that y'all posted a free pdf of the book. Thank you!
I'm not an expert on the Lalonde example; that's Jennifer's domain. But I thought the whole point of Lalonde was that they had an observational study and then also an experiment.
Exercise 20.2 (pp. 417-418) uses the LaLonde dataset. Part (a) asks:
However, I am pretty sure that the dataset does not include experimental controls. e.g.
Am I misunderstanding the problem or the dataset? Or is there something missing?
Edit: in addition, part (b) states
However, their subsetting sounds like it was based around excluding subjects with missing covariates ("The subsample they chose removes men for whom only one pre-treatment measure of earnings is observed"). However, there doesn't seem to be covariate missingness in the data you provided:
PS It is so cool that y'all posted a free pdf of the book. Thank you!