Closed josherrickson closed 2 years ago
Nevermind - got myself mixed up.
For future reference - when calling lmitt.formula()
, all arguments to lm()
are accepted, along with design
and dichotomy
. While the *_design()
functions take a subset
argument, this is not the same as the subset
argument to lm
/lmitt
, so users cannot use the shorthand of
lmitt(y ~ x, design = z ~ ..., subset = ...)
in place of
des <- rct_design(z ~ ..., subset = ...)
lmitt(y ~ x, design = des)
(Though, since dichotomy=
is accepted in lmitt.formula()
and passed to the Design creation, you can use that for subsetting if appropriate, e.g. RD scenarios.)
Could you clarify the last comment, in light of test.lmitt.R including some lines that appear to be testing the use off subset=
in lmitt.formula()
?
I imagine that in those lines the subset=
argument is being accepted (not ignored), but that it only modifies the lm
object and not the Design
that's created en route to creation of the lm
. But perhaps it's something else.
Correct. For clarity sake, let's rename the argument in *_design
to subset_design=
. Then we have
des <- rct_design(z ~ ..., data = ..., subset_design = sub_a)
lmitt(y ~ x, design = des, data = ..., subset = sub_b)
Both subset_design
and subset
are accepted arguments; subset_design=
affects only the creation of the design, whereas subset=
is affecting on the lm
in the typical way. My comment was that this won't work:
lmitt(y ~ x, design = des, data = ..., subset = sub_b,
design = z ~ ..., subset_design = sub_a)
because of course both arguments really have the same name. We could discuss renaming the subset=
argument to *_design
but I don't think we want to do that.
(In case your request for clarity on the "last comment" refers to my statement about dichotomy, in some issue I made a comment about this but can't find it. But essentially there was some discussion about using subset=
argument in rdd_design
to restrict the design to around the discontinuity, I was pointing out that lmitt
accepts and applies a dichotomy=
argument so one could do:
rd_design(time ~ cluster(c) + forcing(f), data = mydata,
dichotomy = time > 10 & time <= 12 ~ time > 12 & time < 14
which would be the same as
rd_design(time ~ cluster(c) + forcing(f), data = mydata,
dichotomy = time <= 12 ~ time > 12,
subset = time > 10 & time < 14
from the perspective of treatment assignment. Of course, in the former, clusters outside of the (10, 14) window would be included with NA binary treatment, whereas in the latter, only clusters within (10, 14) would appear. However in both scenarios, when merging treatment information to the analysis level, it would have the same impact.
Then, since dichotomy=
is a valid argument to lmitt()
, you could use the dichotomy
to subset the design creation step, if the subsetting is defined based on treatment levels, as in RD.)
Thanks! And sorry about the ambiguity re "last comment"; I had meant only to seek the first of the clarifications you gave. (I did find the second helpful as well!)
Let me open the door to another followup from @josherrickson, but without expecting it (in case you don't have anything to add). Suppose a user who's looking to declare their design and estimate effects within a single call to lmitt.formula()
, as opposed to declaring in a *_design()
call and then piping it into lmitt()
or lm()
. Suppose also that this user intends to do some subsetting within the same step. What situations can you envision where subset_design=time > 10 & time < 14
, were we to support that argument, would give a different result than subset=time > 10 & time < 14
being passed to the lm()
part of the call?
The only one occurring to me is when we're using ate()
or itt()
. If the subsetting call removes entire units of assignment, then applying it at the design creation stage changes the weights, whereas if you apply it at the lm estimation stage only then the weights are based off of the entire non-subsetted design, even though some of the units of assignment that contributed to it don't appear. (In this scenario I'd think the user would be well advised to create their design on a separate line of code, for clarity. So if it and others like it are the only case whether there's a difference, I don't see any cause for giving lmitt.formula()
a subset.design=
argument.)
In general, a user could potentially create the Design
inside the lmitt
call, then extract it for further use down the road, in which case they may want subset
and subset_design
to be different.
But overall, I agree with your last* pair of sentences, in most scenarios users are better served by creating the Design
as a separate step. And I think we should market it that way.
Really design=formula
version is only for simple cases I think. I was not advocating allowing both subset
and subset_design
in lmitt
, I opened this issue originally because I was conflating the two and thought things weren't working as expected, but they were hence my quick closure of it.
* Your real last pair of sentences.
I'm not sure I'm ready to go quite as far as "most scenarios" in your statement that
in most scenarios users are better served by creating the Design as a separate step
as there may be users and usage patterns for whom skipping that step works fine most or all of the time. But I'd certainly be on board with "many scenarios".
Currently code for handling
subset=
argument inlmitt.formula
is commented out, due to failing tests.Looks like a sample size issue.
(Note: Haven't pushed these changes up quite yet.)