Closed jwasserman2 closed 8 months ago
Can we make identify_small_blocks
more generic? Something like block_sizes
, and then call something like
bs <- block_sizes(...)
is_small <- bs[bs$n_control == 1 | bs$n_treatment == 1, ]
Seems like it may be more generally useful.
It seems like design_table(design_object, "t", "b")
already gets us the block sizes:
data(simdata)
des <- rct_design(z ~ cluster(cid1, cid2) + block(bid), simdata)
design_table(des, "t", "b")
treatment
blocks 0 1
1 3 1
2 2 1
3 1 2
identify_small_blocks()
can wrap around that and convert the output into a logical vector of the same length as the row dimension, with names given by the row names.
Also, unitids()/units_of_assignment()/clusters()
, whichever one coincides with the function used in the Design
formula, returns unit of assignment ID columns, so to create the base dataframe I described above we would only need to add a cluster ID column to the output of unitids()/units_of_assignment()/clusters()
:
clusters(des)
cid1 cid2
1 1 1
2 1 2
3 2 1
4 2 2
5 3 1
6 3 2
7 4 1
8 4 2
9 5 1
10 5 2
Ah apparently my idea was so good that I'd already implemented and forgotten about it.
See #161 for implementation
Based on discussions with @xinhew0708 and @benthestatistician, there are a couple pieces of functionality we want relating to small-block clustering:
identify_small_blocks()
that counts the treated and control units of assignment in each block and indicates whether it only has one treated or one control unit.identify_small_blocks()
to change the clustering level for units of assignment in small blocks to the block level.We discussed pulling clustering information from a stored dataframe that has a unit of assignment column and a cluster column. When creating a
Design
object, we could store a base version of this dataframe as a slot based on theunitid()/uoa()/cluster()
part of the formula. AllvcovDA()
calls would pull the cluster column, but for model-based calls, the cluster ID's given byunitid()/uoa()/cluster()
would be replaced by block ID's for small blocks (based on the results ofidentify_small_blocks()
). Additionally, If a user specifies a different clustering level using thecluster
argument ofvcovDA()
, the values of the cluster column would be updated to reflect the specified clustering level.Let me know if anyone sees an issue with this approach or this differs from what they had in mind