Closed josherrickson closed 1 year ago
My view is that we should not attempt to support this usage, despite its having served us nicely in @josherrickson's dissertation project. Instead, I think we should route users to the nearest approximation that we currently support.
lmitt.f()
gives you without the subset = z == 1
are the same as the intercept and moderator main effect that lm(<...>, subset = z == 1)
would have given you. So I think we should be directing users to lmitt.f()
w/o subset=
.glm
, say a robust or somehow penalized glm, then those control group residuals probably sum to something a little different from 0, and the equivalence of (1) above fails. But then also the coefficients that better bear interpretation as treatment effects are the treatment and treatment-moderator interaction coefficients that would be reported by lmitt.f()
without the subset = z == 1
.lmitt.f()
with no subset =
modifier that I think we want to be giving people. PS: In answer the "if no" question in the issue statement: Yes I'd certainly support an informative error in this circumstance. However, I'm inclined to say it's not something that merits a lot of coding attention unto itself. (If we stay on message, in package documentation, presentations and papers, about the lmitt()
without subset=
being the way to estimate treatment interactions with a prognostic score, then I see it as unlikely that users would stumble down the lm(<...>, subset = z == 1)
path of their own accord.) These views are less congealed than my view on the main yes/no question.
Perhaps the info being assembled for #119 and/or PR #127 could supply raw materials for a useful error message. Those subprojects call for creating tables of numbers of clusters in which different subgroups have representation (when we've been asked to estimate subgroup effects). These counts may be being broken out by block. Whether or not that is the case, if they're being broken out by assignment to treatment versus control, then the tables could be used to determine whether the user has given a subgroup=
specification that effectively precludes treatment-control comparisons. While #119 was about subgroups, I imagine that Josh W's code enhancements to address it would be extendable to provide similar tables even when there is no moderator variable, or a continuous moderator. (If they aren't already doing that, which they may be; this may be right there in the PR, but I'm not at liberty to check right at the moment.)
The code in PR #127 checks these group counts within the vcovDA()
call, but we could move these checks to the lmitt()
call and perform them for other columns as well. In fact, my response to your comment on that PR @benthestatistician somewhat suggests something along those lines. Since we don't show the coefficients for the subgroups and those are the ones that won't be NA'd by lm()
calls, we could run the checks here after the model's been fit and replace the NA'd subgroup x treatment effects with their respective subgroup coefficients while also producing a warning about it.
I think between the support for continuous moderators in #81 and Josh's work, I'm good to close this.
We currently get NA's if a user attempts to call
lmitt.formula
withsubset = z == 1
style subsetting; e.g. fitting a model only on the treated.@benthestatistician Is this a workflow we want to support? (For reference, this is coming up in my efforts to coerce current flexida work to support my dissertation models.)
Design
creation); should we add an informative error duringlmitt.formula
after subsetting?The other coefficients in the model appear consistent:
If we add an offset model, we get an additional error:
(Note that
as.lmitt
doesn't work right either;because we obviously can't include the treatment variable in such a model.)