`absorb` option for `lmitt()`

benthestatistician commented 2 years ago

When calculating direct adjustment estimates we need a mechanism to "absorb" stratum indicator variables into the independent variables. That is, after recovering the treatment variable and any requested interactions of it and then aligning them with the lmitt() data argument, rather than using them in an outcome regression regress independent variables on stratum indicators, storing the residuals of this regression. Report regression of outcomes on these residuals, rather than on the treatment or treatment and interaction variables themselves. (Aka "within" fixed effects, per Cameron and Miller 2015, or areg, per Stata.)

benthestatistician commented 2 years ago

At today's meeting (9/22) we settled on adding an absorb= option to the lmitt() call in order to support this. Another option occurred to me later that we didn't discuss in the meeting: define a function absorb() that operates on a design objects in much the same way as ett()/ate() do, except returning placeholder weights with the WeightedDesign. The WeightedDesign structure would be enhanced to carry the information that it had been invoked in this way, so that later functions could appropriately absorb the fixed effects. E.g., adopters() would look for this, forming an appropriately residualized treatment vector as needed.

This scheme would have an advantage of consistency within the UI: just as with ate(), the user's instructions about how estimation and inference procedures should recognize the design are passed through the weights= argument. It has the disadvantages of giving the false impression that adoption is being implemented through the weights, that it would be a misleading no-op when passed to an lm() call rather than a lmitt() call; and that it wouldn't quite suffice to make adopters() aware of these instructions, since treatment interactions would also need to be residualized. Given these disadvantages, my preference stays with absorb as lmitt() argument.

jwasserman2 commented 2 years ago

Instituting an absorb argument within lmitt seems to me to be an efficient way of handling this since lmitt is provided all the information necessary to stratum-difference every column of interest at once. The stratum information can be pulled from the provided Design object and the variables given in the formula can be pulled from the data argument. We could stratum-difference here and substitute it as the data argument for the ensuing lm call. In the case that a user calls lmitt on a fitted lm object with absorb = TRUE, it could throw a warning mentioning that stratum-differencing could not be implemented post-model fitting.

I'm not opposed to modeling absorb() after ate()/ett(), since there is a lot of similarity in how they could be carried out. I would need a bit more explanation on how absorb() would work with the weights argument in the case the user would want to use it in addition to ate()/ett() though.

benthestatistician commented 2 years ago

I like this plan for instituting absorb as an argument, @jwasserman2. The warning in the event that a user asks for this after calling lmitt() on a fitted lm is a nice touch.

jwasserman2 commented 2 years ago

I can take a pass at it in the next week or two

benthestatistician commented 2 years ago

Thanks, @jwasserman2. Before you get going, let's see if we can't anticipate some issues and figure out how we're going to get ahead of them. Here are two that occur to me.

~~I'd like propose that we always absorb any stratum indicators presented by the Design~~ Perhaps lmitt.formula() should absorb the intercept even under absorb=FALSE, and also avoid fitting intercepts (assign a +0 to the right-hand-size of its model formula). This gets us out of reporting intercept coefficients. In the design-based interpretation, we don't have a good way of filling in off-diagonal covariance matrix entries corresponding to intercepts (or stratum indicators), so it would be convenient not to have to do it.
If we like option (1), we might also consider absorbing stratum indicators as the default, when stratum indicators are present. I.e. whether the absorb= option should have "TRUE" is its default argument. However, in general this seems to change the values of the Hajek estimators that are obtained when there are user-provided and/or Design-induced weights, in a way that seems to sacrifice interpretability. These downsides are presently conjectural, me not having verified them with examples, so I'm open to correction on them (@xinhew0708?). Barring a correction, the default should be absorb=FALSE.
When lmitt.formula() sees y ~ subgrp + adopters() as its first argument, perhaps we should absorb subgrp, whatever the value of the absorb= option, while interpreting the request for coefficients in the manner of lm(y ~ z:subgrp-0), where z is the numeric (0/1) version of the treatment variable. So we'd fit and report coefficients for the treatment-subgroup interactions, while avoiding the fitting and reporting of coefficients for the subgroups themselves. (Here I'm assuming we go with (1) above, avoiding the fitting and reporting of intercepts, which has similar motivations.) Again the advantages of this are that these coefficients should be of less interest to the user, and that it allows us to reduce the number of covariance matrix entries that we'd have to populate with NAs in some scenarios. (This proposal relates to #57, adding to the discussion there a new option, namely that lmitt.formula() interpret y ~ z+subgrp in the manner of lm(y~z:subgrp), but after absorbing subgrp into the model matrix of z:subgrp. A variant that thread's "y ~ subgrp + z:subgrp" option that avoids reporting the subgrp coefficients.)
Suppose we decide to answer (1) and (3) above in the affirmative. Suppose also that the Design has strata strat. Suppose the model formula is y ~ subgrp :adopters() or equivalent, and that the user has specified absorb=TRUE. Which of the following do we then absorb in to the model matrix of z:subgrp: a. the model matrix of strat+subgrp; or b. the model matrix of strat:subgrp? (I say we postpone the issue of multiple subgrouping variables.[^1]) Option (b) strikes me as more conservative, if in some cases more destructive of data. However, if there were proponents of this approach recommending (a), then I'd prefer that we implement that. (Proponents = Cameron and Miller? The Stata implementation of areg?) This question relates to #59.

Do add your proposals and questions (and comment on mine above).

[^1]: Because it's confusing and because I'm inclined ultimately either to ban it or to handle multiple subgroups in lmitt.formula() by interpreting them as calls for separate analyses for each subgrouping variable.

benthestatistician commented 2 years ago

Just edited my last comment here quite a bit. Among other changes, the new version adss a callout to @xinhew0708. Repeating that callout here because I'm not sure she would have received a notification about it otherwise.

josherrickson commented 2 years ago

It feels like we're conflating three separate issues here.

1) What model does lmitt actually fit? 2) What does lmitt report and what does it suppress. 3) Do we allow a Fixed Effects style model via absorb?

These things are interconnected obviously, but each requires different decisions and perhaps different interfaces.

There seems to also be a larger meta-issue: How much control do we want users to have with regards to 1) above, the model being fit. Obviously they can put different things on the RHS of the formula, but do we allow them to decide about the presence/absence of an intercept? Do we allow them to control the form of subgroup estimation (e.g. : vs *)? If we don't allow any of this control, is a formula the most natural interface to lmitt? Should we just have subgroup= and absorb= arguments? This gets us away from lmitt's interface being very similar to lm, but if we're not going to allow control in the model, do we need to mimic lm?

(This goes to the meta-meta-issue that I keep referring to in our weekly meetings but punting down the road: Are we still working towards the goal that flexida requires only minimal tweaks to existing workflows (e.g. create Design, and add weights=ate(des) to your existing model) or are we slowly moving towards an entirely unique workflow?)

josherrickson commented 2 years ago

Re: multiple absorbs and areg syntax. In general in Stata, if you are asked to identify a grouping variable and pass more than one variable, it assumes you're creating subgroups based upon unique combinations. E.g. if you do something like areg ...., absorb(race gender) that would be equivalent to first creating a variable identifying each individual race*gender group, then including that instead.

benthestatistician commented 2 years ago

Thanks for these comments, @josherrickson . To answer your (perhaps rhetorical) questions about user control, for my part I don't see it as a goal for lmitt.formula() to offer users as much control over which coefficients are fit as lm() does. As a topical example, for the time being I think we should be aiming to support at most one non-treatment variable on the RHS of a lmitt.formula() formula.

More broadly, I think it would work well if in contrast to lm(), lmitt() narrowed things down to the coefficients that can be interpretted as intention-to-treat effect (ITT) estimates. It seems to me now that this helps to narrow and clarify user design goals as well as statistical computing goals. It does have the consequence that we might wind up denying users intercept estimates and subgroup main effects, at least when they fit using lmitt.formula() rather than lm(). Since those aren't ITT estimates, though, that seems to be all for the good -- at least from the perspective of this current kick of mine.

Moving away from a formula interface toward separate specification of subgroups and other options, as Josh's post suggests, is an idea I hadn't thought of. Are you arguing for that, Josh, or pointing it out as an option? For my part I see reasons to have the formula interface other than having it control the model outputs in the same way it would with lm(). But I also agree with you that offering formulas with a different semantics than lm()'s bears some risk of user disorientation.

Regarding the "meta-meta-issue," I don't see the current proposals as abandoning the important goal of offering improved calculations with ony minimal tweaks to existing workflows. Nor do I see us as moving toward an entirely unique workflow -- although you're quite right to flag my proposed overloading of lm formula semantics as too big a step in that direction to take lightly. I do think we should embrace having some of our improvements only be available, or available in better form, after larger workflow tweaks. For this and other reasons I've been coming around to the view of lmitt.formula() as our primary offering, with lm() followed by as.lmitt() as a secondary route, offered for such purposes as enabling users to obtain DirectAdjusted objects with primary effect estimates obtained using the familiar lm().

As a separate matter, based on Josh E's info about multiple absorbs in Stata, I'm ready to go with option b to question 4 of my post a few comments up in this thread.

xinhew0708 commented 2 years ago

Just to clarify, by user-provided and design-induced weights, do you mean weights that are not equal to the cluster sizes (so the Hajek estimator would change)? For example, weights being the cluster sizes divided by corresponding propensity scores.

benthestatistician commented 2 years ago

When specifying the weights= argument to lm() or lmitt(), the user can combine a design-induced weight, based on the odds or probability of treatment, with per-observation weights they designate. A central use case is the one you mention, analysis conducted with data already aggregated to the cluster level and cluster size as the user-supplied weight.

jwasserman2 commented 2 years ago

I feel like it's ok for us to automatically use the within estimator (carry out both 1 and 3 of @benthestatistician's suggestions). If a user is responsibly carrying out their ITT analysis, only the fact that the fixed effects are accounted for should matter to them. In fact, a cunning user could realize this as a distinct difference between lmitt.formula and lmitt.lm: lmitt.lm would allow them to use sandwich standard errors for their least squares dummy variable estimator (terminology from Cameron and Miller 2014) as opposed to the within estimator guaranteed by lmitt.formula. Additionally, Cameron and Miller 2014 suggest within estimation when cluster sizes are small on pg. 329, which I think would be good default behavior to encourage.

However, this difference also brings up issues related to degrees of freedom corrections I think we eventually want to implement. First, default degrees of freedom used for within and LSDV estimates differ, so we would need to make a correction, particularly to the SE's generated by lmitt.formula, to ensure those correspond (ibid. bottom of pg. 330). Second, corrections for downward bias may need to be adjusted differently depending on the estimator used (ibid. top of pg. 331).

As for @benthestatistician's 4th point, I found this article discussing interactions with the treatment effect in FE regression. It's not an exact analog in a few ways--namely that it's for panel data--but I think we can still gain some insight from it (and the article quotes familiar authors such as Cameron). The authors provide this model specification:

which, porting it over to our regime, means x is the treatment variable, z is the subgroup, i is the cluster index, and t is the unit index. strat is not included in this model specification, but from the articles I've read it seems that since it's not an interaction it should be de-meaned on its own. If we interact a cluster-varying covariate (i.e. subgroup) with the treatment (which is cluster-invariant), this article says de-meaning z*subgroup (de-meaning z, subgroup, and z:subgroup) is a "standard procedure in panel data analysis" and "widely used in empirical analysis". They even say Microeconomics in Stata (Cameron* and Trivedi, 2009) computes this estimate by default, but I don't see where in that book they discuss interaction terms. All together, then, this means we would de-mean `zsubgroup + strat`.

** EDIT

josherrickson commented 2 years ago

I suppose I am arguing from moving from lmitt.formula to ..... lmitt.Design? Not sure what to move towards, but if we're OK moving away from the mindset of "just replace your lm call with lmitt", then moving away from formulas will ease the code a bit.

On the other hand, passing variable names via arguments has always been annoying. argument = data$var? argument = "var"? arguments = ~ var? I know the tidyverse has some nicer syntax here, may need to look into it.

If we want to stick with lmitt.formula, I'm fine with that, but I do think we should be conservative with what goes into the formula (as @benthestatistician mentioned, only outcome and maybe one RHS) and make arguments for the rest.

josherrickson commented 2 years ago

From the call today, we came up with the following I believe:

1) absorb= arguement to lmitt will take a boolean; on TRUE the lm will include dummies for clusters as defined the Design but suppress reporting them (#59). FALSE (default) will do nothing. 2) The idea of "absorbing" the intercept and/or subgroup main effects will be non-configurable. See this comment in #59 for details and further discussion.

@benthestatistician if I missed anything, please add here. I know there was some discussion of the covariance estimation that may be relevant.

josherrickson commented 2 years ago

I've got this working on the absorb branch.

To make this work, I've aliased as.factor() with .absorbed(), making it easy to identify the fixed effects we want to suppress at time of show/print/summary. As discussed somewhere else, if a user provides multiple cluster variables, their interaction is included (e.g. every unique combination of levels of the variables provides a single fixed effect).

Large outstanding issue:

[x] Error if user trys to include .absorbed directly. (This shouldn't happen as .absorbed isn't exported, but better safe than sorry, and easy to do with specials.
[x] For some reason summary.DirectAdjusted and print.summary.DirectAdjusted aren't working (the .lm versions are being called).

jwasserman2 commented 2 years ago

In terms of the summary method, I already have some base code in DirectAdjusted.R that you can adjust.

josherrickson commented 2 years ago

Oh wow, thanks - I spent far too long trying to figure out why my version wasn't working.

FYI, I like putting summary.___() and print.summary.___() in their own file, since the usual practice is to have summary.___() generate an object of class summary.___, and feel we should avoid mixing multiple class types in a single file. Hence the presence of "summary.Design.R".

Thanks!

josherrickson commented 2 years ago

Ok, this version is wrapped up. The automatic residualization discussed in #59 remains open.

Of note, I was getting intermittent but unrepeatable errors when not exporting .absorbed. If this continues to rear its head, let me know. I'm trying to avoid exporting it, but wouldn't be the worst if we needed to.

jwasserman2 commented 2 years ago

I don't think absorbed being equivalent to as.factor(cluster1*cluster2*...) is exactly what we want. For example, in simdata, that's grouping together observations with cid1=1, cid2=2 and cid1=2, cid2=1. I believe it doesn't throw an error here since observations in those two clusters have the same treatment assignment value. The code here should be:

  if (absorb) {
    uoa_names <- paste(paste0(".absorbed(", var_names(design, "u"), ")"), collapse = "*")
    obj <- update(obj, paste0(". ~ . + ", uoa_names))
  }

josherrickson commented 2 years ago

Oh shoot you're right - we want as.factor(cluster1)*as.factor(cluster2). Thanks for flagging it.

benthestatistician commented 2 years ago

Actually what we're looking to absorb in this context are the indicator variables for strata, not clusters.

josherrickson commented 2 years ago

Boy did I get any part of this right....

benthestatistician commented 2 years ago

Yes absolutely, the getting stuff done part!

josherrickson commented 2 years ago

Ok, all fixed up.

benthestatistician commented 9 months ago

Not reopening this issue, but using its comment thread to record some adjustments to our absorption that I'm proposing or preparing to propose. There are 2 settled proposals (resolutions) and a mounting proposal about absorption.

Resolutions:

When lmitt.f() is given absorb=T and a blocked design, let's block-center treatment and (as applicable) moderator variables not the response.
When lmitt.f() is given absorb=T and a design with no blocks, let's grand-mean-center treatment and, as applicable, moderator variables.

Proposal in the making:

3. When given absorb=T, lmitt.f() doesn't estimate an intercept per se, but instead returns as "(Intercept)" the control-condition mean, calculated with whatever weights were provided.

Some notes about 3:

i. When the design has no blocks, the "Intercept" that (3) would return coincides with the Intercept term that would be returned by lm() or lmitt.formula(..., absorb=F).
ii. When the design does have blocks, there's not really a meaningful intercept. Ordinarily R just gives you the block effect for the first-mentioned block in the levels set of the blocking factor. Perhaps there are nifty ways of setting the contrasts for that factor that get you something more interpretable, but I'm not aware of anything like this that's widely used.
iii. The control group mean is a useful thing to have ready access to alongside of the treatment effect estimates. iv. We can readily construct bread matrices and estfun matrices for the combined "Intercept"/treatment effects coefficient vector, by adding a single row and column to the $A{22}$ matrix and and a single column to the estfun. As the proposed "Intercept" wouldn't appear in the treatment effect estimating functions and the treatment effects wouldn't appear in its, the additional row and column of $A{22}$ are zero except on the diagonal. The new row of the estfun is just zeroes and, within the control group, weighted deviations from weighted mean of controls. v. (When there's a covadj, I'm tempted to have this "Intercept" be the control group mean of the y's without covariance adjustment, as that's ordinarily the more interpretable quantity. But I confess to not yet having made an attempt to enumerate what might be lost this departure from what lm() with an offset would do.)

benbhansen-stats / propertee

`absorb` option for `lmitt()` #69