Closed benthestatistician closed 4 months ago
I'd appreciate your thoughts and contributions on this, @josherrickson.
Unless Josh changed something in moving the dichotomy, that is supported. See e.g. the last example in this section: https://benbhansen-stats.github.io/propertee/articles/non-binary-treatment.html#dichotomzing-a-non-binary-treatment
I need to:
Terrific, thanks for the clarification! Perhaps the only update the docs need are a pointer to the vignette you just directed me to. In particular, in light of the existence of this vignette, a working example in the ett()
docs is less necessary and might be considered optional.
@adamrauh and I are having trouble with this functionality. Here's a little reprex to illlustrate (what I think is) our trouble:
data(simdata)
des <- rct_design(dose ~ uoa(uoa1, uoa2) + block(bid), data = simdata)
wdes2 <- propertee:::.weights_calc(des, data = simdata, by = NULL, target = "ate",
dichotomy = dose >200 ~ dose <200)
Which gives me
Error in `[<-.data.frame`(`*tmp*`, !is.na(txt), , value = c(1L, 1L, 1L, :
replacement has 10 rows, data has 8
underlying stratum structure of the example:
> with(simdata, table(cut(dose, c(0, 199,200, 400)), bid))
bid
1 2 3
(0,199] 8 6 6
(199,200] 6 0 4
(200,400] 6 8 6
I've attached the bug label to the issue; do take it off if the problem turns out to have been between keyboard and chair.
PS: when the bug's taken care of, a ping to that effect on mistem_school#19 would be appreciated.
@jwasserman2 I believe this may fix it?
diff --git a/R/weights.R b/R/weights.R
index 47cddcf..8eab4c3 100644
--- a/R/weights.R
+++ b/R/weights.R
@@ -179,3 +179,3 @@ ate <- function(design = NULL, dichotomy = NULL, by = NULL, data = NULL) {
design@structure[, var_names(design, "u"), drop=FALSE])
- )[[var_names(design, "b")]]
+ )[[var_names(design, "b")]][!is.na(txt)]
block_units <- table(blks[!is.na(txt), ])
Works for me!
Some tests that strike me as worth having. First, a case where the exclusions change take out specific units, but otherwise leave rates of assignment to treatment strictkly between 0 and 1.
wdes2 <- propertee:::.weights_calc(des, data = simdata, by = NULL, target = "ate", dichotomy = dose >200 ~ dose <200)
expect_true(all(wdes2[simdata$dose==200]==0))
expect_true(all(wdes2[simdata$dose!=200]!=0))
Second, a case where the exlusion leads to some blocks having all treatment or all control. These blocks should receive 0 weights too.
with(simdata, tapply(dose<100, bid, any))
##> 1 2 3
##> TRUE TRUE FALSE
wdes3 <- propertee:::.weights_calc(des, data = simdata, by = NULL, target = "ett", dichotomy = dose >200 ~ dose <100)
expect_true(all(wdes3[simdata$bid==3]==0))
Both work as expected after patching as @josherrickson suggests above.
I'm seeing errors elsewhere with this change:
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
══ Failed tests ════════════════════════════════════════════════════════════════
── Failure ('test.WeightedDesign.R:527:3'): #130 zero weights with non-varying treatment in a block ──
all(wts > 0) is not TRUE
`actual`: FALSE
`expected`: TRUE
── Failure ('test.WeightedDesign.R:532:3'): #130 zero weights with non-varying treatment in a block ──
all(wts > 0) is not TRUE
`actual`: FALSE
`expected`: TRUE
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 1991 ]
I unfortunately don't have time to track this down further today and I'm out of town for the next week. @benthestatistician if you want to push this change anyways just so things can run on the other project, that's fine, myself or Josh can look further at it when we return from our respective trips.
Ack, too bad. I looked at it briefly before running out of time myself. Notes:
NA
s in the generated weights vector into zeroes. This appears to be incorrect: both before and after the change, about half the observations in STARdata get excluded from the lmitt
fit, presumably b/c of NA weights. wts
shouldn't have any zeroes. It asks for ATE weights, $A/\pi - (1-A)/(1-\pi)$, where $\pi$ is as follows:
with(des@structure, tapply(starkbinary, schoolidk, mean)) |> summary()
##> Min. 1st Qu. Median Mean 3rd Qu. Max.
##> 0.161 0.245 0.286 0.301 0.374 0.443
@josherrickson's intuition is right. My in-line comment indicates that's what it should be. NA's in the STARdata dataset are causing the failing tests
The issue is with the dimensions of blks
, .merge_preserve_order(...)
and txt
:
Browse[1]> mpo <- .merge_preserve_order(
unique(data[, var_names(design, "u"), drop=FALSE]),
cbind(design@structure[, var_names(design, "b"), drop=FALSE],
design@structure[, var_names(design, "u"), drop=FALSE])
)[[var_names(design, "b")]]
. + Browse[1]> dim(blks)
[1] 11598 1
Browse[1]> length(txt)
[1] 11598
Browse[1]> length(mpo)
[1] 6325
In this case, the output of .merge_preserve_order
is already the right size:
Browse[1]> table0(txt)
FALSE TRUE <NA>
4425 1900 5273
Browse[1]> 4425+1900
[1] 6325
so the rows aren't lining up. In the example from Ben, the NA's were coming from treatment status alone; in STARdata, we have NA's in blocks as well:
Browse[1]> table(is.na(STARdata$schoolidk))
FALSE TRUE
6325 5273
which are then causing the NA's in the treatment status:
Browse[1]> table(is.na(txt), is.na(STARdata$schoolidk))
FALSE TRUE
FALSE 6325 0
TRUE 0 5273
I'm not sure the best solution here - special-casing this particular line doesn't feel right.
Flagging that a part of a sub project of @julian-bernado's ("ADSY cycle 1 additional analysis") is waiting on a resolution to this. (Josh W's effect estimates for ADSY-PEP, in the mistem_school repo, receive incorrect standard errors as a result of this.)
Are the standard errors incorrect because n
in the sandwich calculations is calculated using all rows in the dataset rather than the number of rows with non-zero weight? If so, would @julian-bernado be unblocked by removing those rows from the lmitt()
call rather than giving them zero weight? I will take a look at Josh E's comment above and consider possible solutions this weekend
The problem is in that general direction, yes. It could be that there's a fix like you're suggesting, but I'm uncertain. It would be easier to have confidence in a fix beginning with a fix of this issue. (I should note that Josh E was also going to try to find some time for this soon, not over the weekend but probably on Friday.)
PR #181 implements the fix Josh E identified and address the problem with STARdata
's NA block ID's. Tests address these situations and include the tests Ben desired above
(yet another contination of #30...). Scanning our tests in which a
dichotomy
is specified, I mostly see exhaustive dichotomies, dichotomies placing each unit into treatment or control.dichotomy=
to specify treatment and control groups that don't add up to everything in the design? E.g.,myZ
is an ordered factor with levelsctl
<trt0
<trt1
. Natural exhaustive dichotomies are "myZ >= 'trt1' ~ myZ < 'trt1'
" and "myZ >= 'trt0' ~ myZ < 'trt0'
". But there's also a natural non-exhaustive dichotomy, namely "myZ == 'trt1' ~ myZ == 'ctl'
". Is this supported?ett()
help page demonstrating what is and isn't supported.