intercept for lmitt(absorb=T)

When absorb=T is passed to lmitt.formula(), let's calculate and return in the intercept position a "one-way fixed effects"-weighted mean of stratumwise control group means of the offsetted response: $\sum_b \{ (\sum_i z_i w_i)[\sum_i (1-z_i) w_i]/(\sum_i w_i)\} \times \{\sum_i (1-z_i)w_i[y_i[0] - g(\vec{x}_i\beta)]\}/(\sum_i(1-z_i)w_i)$, where sums in $i$ range over $i$ in block $b$ and the sum in $b$ ranges over blocks $b$.

This also calls for updates/adjustments to lmitt.formula(), estfun.teeMod() and bread.teeMod(), as detailed in section 3 of estfun_DA.

Notes:

Stratumwise control means to be calculated with weights, when weights are provided.
No immediate implications for summary and show methods for teeMods.
It would be nice also to return, not as part of the coefficients but as part of a separate slot in the teeMod, a correspondingly weighted average of offsets $g(\vec{x}_i\beta)$. And doing the same also with other teeMods that don't involve absorption. This way the teeMod would include each ingredient to reproduce the marginal mean of the control group $y$s.

Initially I'm soliciting comments. Relates a bit to #187.

We desire $\hat{\tau}=\hat{\mu}_{\tilde{Y}}(1)-\hat{\alpha}_{0}$ or, if called for, $\hat{\tau}(X)=\hat{\mu}_{\tilde{Y}}(1|X)-\hat{\alpha}_{0}$ where $\hat{\mu}_{\tilde{Y}}(1)$ and $\hat{\mu}_{\tilde{Y}}(1|X)$, the weighted mean of residuals $\tilde{Y}(1) = Y(1) - f{\beta}(\vec{x}; 0)$ in the treatment group or the subset of treatment units with characteristic $X$, respectively, come from the fitted regression in line 336 of lmitt(), and $\hat{\alpha}{0}$ is the quantity Ben proposed in the original comment.

Proposed changes to accommodate this

Changes to `lmitt()`

Compute $\hat{\alpha}_{0}$: model$model will have the weights, offsets, and outcomes, and the function stores stratum ID's in the blocks vector earlier (though in order to have blocks when there are no strata in the design, we need to address #187 first by implementing the suggestion I've commented there). The only thing we don't have are the uncentered assignment indicators (model$model and mm have stratum-centered assignment indicators). We could get them from mm (prior to centering) by adding a line to this branch of logic: https://github.com/benbhansen-stats/propertee/blob/7d8fb2f9dbabdbe397ac53c17a98374dd6f0231a/R/lmitt.R#L297-L299 I don't think there's a need for unit of assignment ID's, since it's my understanding that $i$ here indexes units of observation, but someone should correct me if I'm wrong.
Compute $\hat{\mu}_{\tilde{Y}}(1)$ and any necessary $\hat{\mu}_{\tilde{Y}}(1|X)$: I think this should just require summing elements of model$coefficients.
Update model$coefficients to be setNames(c(muhat0, muhat1 - muhat0, ...), names(model$coefficients)), where the ... includes any estimates $\hat{\mu}_{\tilde{Y}}(1|X)$.

Changes to standard error estimation routine

The following proposed changes are based on what I have the estimating equations for $\alpha{0}$ written as: $0=\sum_{i=1}^{n}\check{w}_{i[z{i}]}(1-z_{i})[y_{i}(z_{i})-g(\vec{x}_{i}\beta)-\alpha_{0}]\pi_{s{i}}$, where $s_{i}$ is the stratum indicator and $\pi_{s_{i}}=(\sum_{{j:s_{j}=s_{i}}}\check{w}_{j}z_{j}) / (\sum_{{j:s_{j}=s_{i}}}\check{w}\{j})$.

Update .align_and_extend_estfuns(): We can update the estimating equations by adding a line after this one: https://github.com/benbhansen-stats/propertee/blob/7d8fb2f9dbabdbe397ac53c17a98374dd6f0231a/R/teeMod.R#L182 that adds the necessary column to psi using x$model as explained earlier to get some of the columns and something like the code used in lmitt() to get the blocks and uncentered assignment indicators: https://github.com/benbhansen-stats/propertee/blob/7d8fb2f9dbabdbe397ac53c17a98374dd6f0231a/R/lmitt.R#L249-L251 The vector $[\pi_{s_{i}}:i=1, ..., n_{\mathcal{Q}}]$ can then be computed by (rowsum(x$weights * (1-uncentered_a), blocks) / rowsum(x$weights, blocks))[blocks]. EDIT 10/29/24: this should multiply by uncentered_a rather than 1-uncentered_a.
Update .get_a21(): damod_mm, defined here, needs additional columns corresponding to the gradient of the above estimating equations with respect to $\beta$. Based on lines prior to the creation of damod_mm, the column to cbind to damod_mm should be w * (1-uncentered_a) * pi_s, where we get uncentered_a and pi_s the same way we did in .align_and_extend_estfuns().
Update .get_a22_inverse(): neither the original intercept nor $\tau$ appear in the estimating equations, so the extra row and column in $A_{22}$ will have 0's on the off-diagonal. The diagonal will be sum(x$weights * (1-uncentered_a) * pi_s), where we get uncentered_a and pi_s the same way we did in .align_and_extend_estfuns().

These changes should suffice for updating the standard errors.

Other questions:

Do we want to update model$fitted.values and model$residuals to reflect the use of $\hat{\alpha}_{0}$ rather than the original intercept?

I didn't address the last bullet point of Ben's original comment in my comment, but the weighted mean offset should be straightforward to compute wherever we compute $\hat{\alpha}_{0}$ in lmitt(). Thoughts on a name for the slot where it's stored?

Code could look like:

cols_to_blk_sum <- cbind(wts = 1,
                         (1-uncentered_a) *
                            cbind(ctrl_grp_wts = 1,
                                  ctrl_grp_y = data[[lm.call$formula[[2]]]],
                                  ctrl_grp_os = if (!is.null(lm.call$offset)) lm.call$offset else 0)))
col_sums_by_blk <- rowsum(x$weights * cols_to_blk_sum, blks)
col_sums_env <- list2env(as.data.frame(col_sums_by_blk))
col_sums <- colSums(with(col_sums_env, {
   ctrl_grp_wts * (wts - ctrl_grp_wts) / wts * cbind(ctrl_grp_y, ctrl_grp_os) / ctrl_grp_wts
}))
ctrl_grp_mean_os <- col_sums[, "ctrl_grp_os"]
alpha0 <- col_sums[, "ctrl_grp_y"] - ctrl_grp_mean_os

Operations between vectors and matrices aren't always performed in the intuitive direction (i.e. column-wise instead of row-wise), so sometimes I use the sweep() function to be safe when dimensions don't align:

col_sums <- colSums(with(col_sums_env, {
   sweep(cbind(ctrl_grp_y, ctrl_grp_os), 1, ctrl_grp_wts * (wts - ctrl_grp_wts) / wts / ctrl_grp_wts, FUN = "*")
}))

Notes re JW's 1st reply comment:

Didn't mean to suggest we need new coeff's for $\hat{\mu}_{\tilde{Y}}(1)$, nor for $\hat{\mu}_{\tilde{Y}}(1| X)$. Given that we're reporting $\hat{\tau}_{\tilde{Y}} = \hat{\mu}_{\tilde{Y}}(1) - \hat{\mu}_{\tilde{Y}}(0)$, my idea was just to add $\hat{\mu}_{\tilde{Y}}(0)$, which I've been calling "$\hat{\alpha}0$," to the report. This gives users enough to reconstitute the estimate of $\mu\{\tilde{Y}}(1)$, as a contrast of the provided coefficients, if they so choose.
Thanks for raising the subject of estimates with a moderator variable, which I hadn't been thinking about. I think we should also be (re-)generating Intercept column entries in the absorb=T scenario when there is a moderator variable. Then and only then can we invite the the same interpretations for intercepts after absorption as we do for intercepts after inverse probability weighting w/o absorption. a. with a subgrouping X, we need to re-calculate each subgroup "main effect", overwriting it and its corresponding estfun column. A different set of one-way fixed effects weights are used for each of these main effects. The estimating equation for subgroup g takes the form $0=\sum{i \in g}\check{w}_{i[z_{i}]}(1-z_{i})[y{i}(z_{i})-g(\vec{x}_{i}\beta)-\alpha{g}]\pi{gs{i}}$, where $\pi_{gs_{i}}=(\sum_{{j\in g: s_{j}=s_{i}}}\check{w}_{j}z_{j}) / (\sum_{{j \in g:s_{j}=s_{i}}}\check{w}_{j})$. b. when there is a continuous moderator m, ~~I think we can get away with regenerating the intercept in the same way that we do it when there is no moderator~~ we need to calculate the one-way fixed effects weighted average over the control group of $(y - m\hat\gamma)$, where $m$ is the main effect of m as estimated in the primary regression. That is, the estimating equation takes the form $0=\sum{i}\check{w}_{i[z_{i}]}(1-z_{i})[y{i}(z_{i})-g(\vec{x}_{i}\beta)-m_{i}\gamma - \alpha]\pi{s{i}}$, where $\pi_{s_{i}}=(\sum_{{j: s_{j}=s_{i}}}\check{w}_{j}z_{j}) / (\sum_{{j:s_{j}=s_{i}}}\check{w}_{j})$. (In contrast to the cases of categorical moderators or no moderation, this puts estimation of the revised intercept downstream of the main regression: in the Intercept row of the $A\{22}$ matrix, one has a weighted sum of $m_i$'s in the m-main effect column, not all-zero off-diagonal entries as in the subgrouping and main-effect-only scenarios.)

intercept/main effect names w/subgrouping variables

In the categorical subgrouping variable case, the main effect coefficients get printed with inconsistent labeling:

data(simdata)
des <- rct_design(z ~ unit_of_assignment(uoa1, uoa2), data = simdata)
mod2 <- lmitt(y ~ as.factor(o), data = simdata, design = des, weights = "ate")
coef(mod2)
       (Intercept)    `as.factor(o)2`    `as.factor(o)3`    `as.factor(o)4` 
            -0.650              0.993              0.575              0.501 
`z._as.factor(o)1` `z._as.factor(o)2` `z._as.factor(o)3` `z._as.factor(o)4` 
             0.718             -0.226                 NA              0.287

It would be better if either all of these names had "Intercept" in them or none did (my pref). Perhaps this is something that we could wrap into this task. If so I'd be in favor of also executing this name change as part of the lm() -> as.lmitt() path.

Final bullet of the original post

First let me restate the issue more precisely. Because $\mu_{\tilde{Y}}(1) - \mu_{\tilde{Y}}(0) = \mu_{Y}(1) - \mu_{Y}(0)$, $\hat{\tau}_{\tilde{Y}}$ estimates $\tau{Y}$ as well as $\tau{\tilde{Y}}$. But it doesn't follow that $\hat{\mu}_{\tilde{Y}}(0)$ estimates $\mu_{Y}(0)$. As a result, including with the teeMod coefficients an intercept that estimates $\mu_{\tilde{Y}}(0)$ doesn't quite equip the user to estimate either $ \mu_{Y}(0)$ or $\mu_{Y}(1)$. Rather, since $\mu_{\tilde{Y}}(0) = \mu_{Y}(0) - \mu_{f\beta(x;0)}$, to enable estimation of $ \mu_{Y}(0)$ or $\mu_{Y}(1)$ as a sum involving $\hat{\alpha}_0 = \hat{\mu}_{\tilde{Y}}(0)$ and perhaps $\hat{\tau}_{\tilde{Y}}$, we'd also need a complementing estimate of the offset mean $\mu_{f\beta(x;0)}$. The fact that these estimates will involve $\hat\beta$, it seems that they should be within the scope of the estfun, to allow standard errors to account for their covariance with other estimates.

I see two sorts of estimates of $\mu_{f\beta(x;0)}$ that we could report. I think I recommend that we offer the first of the two, because it's more readily implementable and coincidentally better bears a label like "mean of offsets," which I like; but I'll list both here because it isn't immediately clear to me which of the following would be more useful to users. i. $\overline{f_{\hat\beta}(x;0)}_{{i \in \mathcal{Q}: z_i=0}}$ an approriately weighted mean over controls; or ii. $\overline{f_{\hat\beta}(x;0)}_{{i \in \mathcal{Q}}}$, an appropriately weighted mean over the full comparison sample. In (i), the appropriate weighting is be the same as the weighting appropriate to estimating $\mu\{\tilde{Y}}(0)$ from control group $\tilde{y}_i$'s: with absorption, the one way-fixed effects weights induced by blocking in combination with any user-provided weights; without absorption, the same weights as were used for fitting the teeMod. Under (ii), in contrast: with absorb=TRUE the appropriate weighting is by user-provided weights, or flat weighting if no weights were provided; with absorb=FALSE, however, the appropriate weighting should undo whatever inverse probability weighting may have been baked into the user-provided weights through inclusion of an ate() or ett()-derived factor, i.e. divide the weights through by the same factor. Which sounds messy.

If we go with $\hat{\mu}_{f\beta(x;0)} = \overline{f_{\hat\beta}(x;0)}_{{i \in \mathcal{Q}: z_i=0}}$, we could efficiently calculate both $\hat{\mu}_{f\beta(x;0)}$ and $\hat{\mu}_{\tilde{Y}}(0)$ as weighted means of the control group; similarly for subgroup-specific intercepts. In the absorb=TRUE scenario, we would first determine the one-way fixed effect weights, at least for control group members; in absorb=FALSE, we'd just use whichever weights were used to fit the teeMod.

Just noting a connection to #69. Point 3 of this comment on that thread is addressed by this issue.

234210d adds a section 3 to estfun_DA, the purpose of which is to record in detail the changes called for in this issue (and a few others that go with them). Here is a latexdiff of the estfun_DA document that indicates where I changed or added to the document (the comparison point being 1 week ago): estfun_DA-diff686efa2.pdf .

benbhansen-stats / propertee