DB variance estimates incorporating both-sides absorption

benthestatistician commented 1 month ago

The estfun_DA vignette has been revised to include plans for an alternate design-based variance estimator along lines I've discussed with Xinhe and Julian; see the version on the design branch, sections 2.1 and 2.2. The estimator requires the assumption of a constant treatment effect. Its estimating function can be calculated using the same estfun.teeMod() method as would be used for the current, right-side-absorption only estimating function, by passing the teeMod through an initial preprocessing step. This step would modify the embedded offset variable by shifting it within each block, in such a way that after the shifts the offsetted response is block-mean-centered.

benthestatistician commented 1 month ago

Here is the referenced update of the estfun_DA spec: estfun_DA-87d2667.pdf.

On further thought there seems to be less need for new calculations than I had described to Xinhe and @julian-bernado. Accordingly I've assigned @jwasserman2 instead of Julian, as he'll be well positioned to plot and execute the surgery on teeMod objects that I'm calling for in order to implement the method. @xinhew0708 can offer some context.

This is to be offered in addition to, not instead of, design-based covariance estimators using right-side-only absorption. Since the calculations are much the same, while the right-side-only version avoids the constant treatment effect assumption.

jwasserman2 commented 3 weeks ago

Changes in SandwichLayerVariance.R

Maybe .vcov_DB0() can accept an absorb argument passed to ... that determines RHS vs. both-sides absorption
If the default DB variance option would be RHS absorption, we can then change the args$x <- x line in .vcov_DB0() to args$x <- if (!is.null(args$absorb) & args$absorb == "both") block_center_residuals(x) else x. The logic would be different if both sides absorption is the desired option. Also, we could name the function areg.center_residuals().

The `block_center_residuals()` function

This function can be written in its own script block_center_residuals.R. Then make sure to add block_center_residuals.R to the @include line at the top of SandwichLayerVariance.R.

The function itself could be something along the lines of:

blks <- stats::expand.model.frame(x, var_names(x@Design, "b"))[,var_names(x@Design, "b")]
blk_means <- rowsum(x$residuals * x$weights, blks) / rowsum(x$weights, blks)
centered_u <- blk_means[blks]
x$residuals <- centered_u
x

It's sufficient to update the residuals because within .vcov_DB0(), meatCL() calls estfun.teeMod(), which calls .base_s3_class_estfun() under absorption, which is an alias for estfun.lm(), which calls stats::residuals.lm(x), which returns x$residuals.

benthestatistician commented 3 weeks ago

Thanks for these great hints, @jwasserman2 !

Josh is right that the default would be RHS absorption. But rather than adding an "absorb=" argument[^1], I suggest "const_effect=", or "constant.effect=" or whatever best fits w/ what we're doing elsewhere, defaulting to "FALSE"?

[^1]: It strikes me that it's really only the "small stratum" covariance estimator, as Xinhe and I refer to it, that is affected by this change; the "large stratum" or Neyman-type variance estimator comes out the same way whether or not one absorbs on the left side. So we may pull users more into the minutia than is necessary by asking them to decide between "both sides" and "right-side only" "absorption".

benthestatistician commented 3 weeks ago

(Slightly off-topic: From reading its code, I get the impression that .vcov_DB0() currently allows for (right hand side-only) absorption. Users will typically understand absorption as an alternative to inverse probability weighting -- in particular, they won't see the need for the latter if they've done the former[^1]. So I was surprised to see the warning at SandwichLayerVariance.R#L616-L622.

    # if model weights does not incorporate IPW, throw a warning
    if (!(inherits(x@lmitt_call$weights, "call") & 
          sum(grepl("ate", x@lmitt_call$weights)) > 0)){
      warning(paste("When calculating design-based standard errors,",
                    "please ensure that inverse probability weights are applied.",
                    "This could be done by specifying weights = ate() in",
                    "lmitt() or lm()."))

~~Is this a vestige waiting to be removed, @xinhew0708, or does it serve some other purpose?~~)

[^1]: Yes, in a bunch of cases we want to steer them away from absorption because it's not consistent -- setting that aside for now.

benbhansen-stats / propertee