Closed jwasserman2 closed 6 months ago
The approach we aimed to support and implement is a bit narrower than what's proposed here, serving a subset of the use cases that this would serve. (With apologies for not remembering to note this when we spoke earlier today!) In this other approach, units of assignment have only one treatment status, but that status is permitted to be ordinal. In the toy data at the top of this issue, units a
and b
would have trt=='2020'
, units c
and d
would have trt=="2021"
, and units e
and f
would have trt="."
, say, with trt
being an ordered factor with levels ordering "." < "2021" < "2020"
.
Let's suppose that in the toy example above the blocks are { a
, e
} and {b
, c
, d
, f
}, and suppose we want to do weighting by the odds, separately in years 2020 and 2021. The pair { a
, e
} is easy: in both years it's got 1 treatment and one control, and both receive weights of 1. For the 4-tuple {b
, c
, d
, f
}, the right answer depends on the year: in year 2020 there's just one of the 4 under treatment, b
, so its weight is 1 and the others' weights should be 1/3 each; in year 2021 b
, c
, and d
are each under treatment, so their weights are 1 and f
's weight is 3. I think that's what ett()
will give us for an appropriately encoded design and an appropriate invocation of the function.
For this I think the design should be encoded without a dichtomy
, with the above communicated through the formula passed to ett()
via its dichotomy=
argument. It's getting late on me know, so I figure I'll check on this in the a.m.
assigned()
doesn't currently accept a dichotomy
argument, so I think the Design
needs a dichotomy
for assigned()
to create the trt
column of my example dataset. Could you explain how you see the Design
obtaining information of how to weight separately in different years? In my mind, you would need a dichotomy
that incorporates a column denoting assignment status in a given year, like the one I proposed above. This still necessitates modification b from my first comment. Maybe you have something else in mind, though.
I'm trying to wrap my head around this issue. Is this similar to the discussion at https://github.com/benbhansen-stats/propertee/issues/30#issuecomment-1064480711?
I believe my current thoughts are exactly what you've written there. How did you address that using dichotomy
?
OK, so I've done a bit of the experimenting I had resolved to do last night. What I was envisioning does seem to be doable, if not in the smoothest way. The toy example Josh W provided could help us improve.
Josh W's correct that assigned()
won't currently do what I was hoping for. Neither do ett()
and ate()
, although they're closer. 73eee5c puts one example of their use into a new test, in test.CombinedWeightedDesign.R, on new branch i30_dichotomy. I'm going to reopen #30 and record some other comments over there.
Well, we ended up with dichotomization + CombinedWeightedDesigns. What is stopping a similar process from being used here? Generate a single-dimensional treatment variable (paste(year, trt)
), use dichomization to generate weights per year, then combine them?
I somewhat misunderstood @jwasserman2's proposal at the top of this issue when I first replied to it. (In the passage below I was taking "the dataframe" to refer to a table stored within the Design object, but I see now that it was mean to refer to the analysis dataframe (time_dat
) in Josh's example:
each unit should be allowed to have rows in the dataframe with different assignment statuses as long as the assignment statuses match the dichotomy. We can see this is the case for the user's data since
trt == 1
only whenyear >= year_trt
.
Also "assignment status" refers to a binary treatment variable as would be created by assigned()
, perhaps distinct from the t
column of the Design@structure
data frame.) Now that I better understand the gist of the proposal, I'm a fan.
I'd like to suggest a modification to it making it a smaller change to the Design class spec. Specifically, I'd omit the part of hte proposal about adding columns to the @structure
tables of design-class objects. I think the proposal should leave that as-is and instead change the meaning of the Design@dichotomy
slot. The proposal should require that its expressions always evaluate to the correct binary treatment specification within a data frame merging the analysis data
with the Design's @structure
data frame.
Turning now to @josherrickson's comment above (suggesting per-year dichotomization followed by combining the years), a downside of the amended proposal is that WeightedDesign
s couldn't be freely c()
'ed together as they are now while also preserving the new semantics of their dichotomy
slots. As a result this would interfere with the compute-separately-by-chunk approach that Josh E's latest comment references. But that isn't such an appealing workflow, and I think we should be open to retiring it if we can get its benefits in other ways.
@josherrickson: I think our earlier discussions that led to dichotomization + CombinedDesignWeights were hampered by not having a clean simple example. Now that Josh W has given us one of those, I'm hoping we can make some more progress? See my comment reopening #30. Addressing your question just above, the main outstanding challenge is to generate a treatment variable for each year (that propertee functions will recognize as such) alongside of the weights we create for each year.
@jwasserman2: I think the test I added on the i30_dichotomy branch addresses your request that I "explain how [I] see the Design obtaining information of how to weight separately in different years," if not necessarily with elegance. When in the same followup comment you referenced "modification b from my first comment", I think you were pointing to the proposal rendering as proposal 2 in your initial comment, "allowing non-treatment variables in the dichotomy"? Please correct me if I'm mistaken.
@benthestatistician thanks for pointing me to your branch. That setup makes a lot of sense to me. Yes, modification 2 referred to allowing non-treatment variables to be reference in the dichotomy. The way you set it up in your branch avoids the need for it, though.
Let's continue this discussion over in #30, where in the comments I have a proposal incorporating important elements of the proposal at the top of this thread. My proposal differs from the above in ways indicated by my comments above, and some others. To avoid confusion moving forward I'm going to close this issue out.
Suppose a user has repeated measures data where units are assigned to treatment at different times:
Units appear multiple times in the dataset, sometimes with differing assignment values, making assignment status specific to
id
andyear
. The user should be able to create aDesign
object with something like:The
unitid\cluster\unit_of_assignment
should beid
, but each unit should be allowed to have rows in the dataframe with different assignment statuses as long as the assignment statuses match the dichotomy. We can see this is the case for the user's data sincetrt == 1
only whenyear >= year_trt
.This requires a few updates:
dichotomy
. Previously, this affectedassigned()
+ its aliases,ate()
andett()
, but after #162, this affects the creation ofDesign
objects:This error arises in
treatment()
.treatment()
calls.bin_txt()
, shown below, and calls.apply_dichotomy(treatment(des, binary = FALSE), des@dichotomy)
. Buttreatment(des, binary = FALSE)
only returns the original treatment column in the design data when in this case,.apply_dichotomy()
also needs the columns specified in thedichotomy
. This leads to the error above. https://github.com/benbhansen-stats/propertee/blob/f5a26bf8d90f794d92d63761184d9b1ac9c0f9d7/R/DesignAccessors.R#L171-L181My proposal for addressing this would be to add columns specified in the
dichotomy
to thestructure
slot of aDesign
object. We could then use logic like.get_col_from_new_data(x, newdata, type = "d", by)
, where"d"
stands for "dichotomy", to access them when needed.ate()
andett()
, if need be, to give appropriate weights to units that are sometimes control and sometimes treatmentThese potential changes touch a lot of our codebase, so I think discussing the best way to approach it here before diving in on it would be a good idea.