Open acoppock opened 6 years ago
Here's another restatement of the problem that I think underlines the difficulty of simultaneously holding the total number of treatments allocated fixed while maintaining nominal probabilities of assignment.
Imagine two blocks, A and B. A has two units and B has three units. You want to block such that "half" of the units are treated.
In block A, we'll always assign one to treatment and one to control. In block B, we can assign one to treatment or two to treatment.
We have two options. 1) hold the total number of treatments fixed, say at 2, which means that units in block A have prob = 0.5 of treatment and units in block B have prob = 1/3 of treatment. 2) maintain nominal probabilities of assignment by randomizing (50/50) whether we treat 1 or 2 in block B. Then all units have prob = 0.5 of treatmetn
Neither option satisfies both goals of keeping total number of treatments fixed and nominal probabilities of assignment.
I think we'll need to restrict the set of cases in which we can execute load balancing, even among two arm trials. I think that if we have an even number of blocks that are all of the same size, then we might be able to make some guarantees? I'm not sure.
"load balancing" has long been a goal for randomizr. Various solutions have been attempted and even implemented, but were finally abandoned for some reason or other. This issue is for version 2!
some background
block_ra
and variants conductcomplete_ra
within blocks. The various arguments toblock_ra
allow users either to give the same arguments tocomplete_ra
for each block or different arguments in each block. We implementblock_ra
with anmapply
, which callscomplete_ra
separately for each block with right arguments passed at the right time.The issue
Imagine that we have 2 blocks of 3. If we do
complete_ra
within each, we will assign either 1 or 2 to treatment in each block. There are three ways this could come out:The goal of load balancing is to ensure that only scenario 2 (a total of three units treated) ever occurs.
Troubles
We think that the two-arm trial with load balancing is feasible. The user sets something like a "total treated" argument, then we allocated the number to be treated across blocks in a way that preserves the nominal probabilities of assignment. I.e., the randomization is weighted in such a way that the true probabilities of assignment for each unit are exactly what they would be for the otherwise equivalent
block_ra
call.Extending load_balancing to the multi-arm is very hard (and may actually be logically infeasible, i.e., it may be impossible to achieve the nominal probabilties of assignment while maintaining the load balance.
Here's an email that I think explains the issue with multi-arm trials:
Way forward
I think that we should probably implement a new function
balanced_block_ra
or similar that handles the two arm case.There may also be special cases of the multi arm trials that we could accomodate
Attachments
I'm attaching a pdf of a write-up of the problem that Macartan circulated a while ago.
Systematic load balancing block randomization for arbitrary individual assignment probabilities.pdf