DeclareDesign / randomizr

randomizr: Easy-to-Use Tools for Common Forms of Random Assignment and Sampling
https://declaredesign.org/r/randomizr
Other
36 stars 9 forks source link

load balancing #35

Open acoppock opened 6 years ago

acoppock commented 6 years ago

"load balancing" has long been a goal for randomizr. Various solutions have been attempted and even implemented, but were finally abandoned for some reason or other. This issue is for version 2!

some background

block_ra and variants conduct complete_ra within blocks. The various arguments to block_ra allow users either to give the same arguments to complete_ra for each block or different arguments in each block. We implement block_ra with an mapply, which calls complete_ra separately for each block with right arguments passed at the right time.

The issue

Imagine that we have 2 blocks of 3. If we do complete_ra within each, we will assign either 1 or 2 to treatment in each block. There are three ways this could come out:

  1. a total of 4 units will be treated (two in each block),
  2. a total of 3 (one in one block, 2 in the other)
  3. a total of 2 will be treated (1 in each block).

The goal of load balancing is to ensure that only scenario 2 (a total of three units treated) ever occurs.

Troubles

We think that the two-arm trial with load balancing is feasible. The user sets something like a "total treated" argument, then we allocated the number to be treated across blocks in a way that preserves the nominal probabilities of assignment. I.e., the randomization is weighted in such a way that the true probabilities of assignment for each unit are exactly what they would be for the otherwise equivalent block_ra call.

Extending load_balancing to the multi-arm is very hard (and may actually be logically infeasible, i.e., it may be impossible to achieve the nominal probabilties of assignment while maintaining the load balance.

Here's an email that I think explains the issue with multi-arm trials:

If we take this out of the blocking scenario, let’s just first imagine using prob_each in complete random assignment

15 units,

prob_each = c(.1, .2, .7)

we know we should assign

floor(15 * prob_each) = c(1, 3, 10), i.e. 1 unit to T1, 3 to T2 and 10 to T3.

That leaves us with 1 unit left to assign, which we should assign to each of the three treatments with prob_each probabilities, i.e.

10% of the time to T1 20% of the time to T2 70% of the time to T3

This procedure works great, even with 0 probabilities.

Now imagine that we have two blocks of 15 units.

We could:

A) do complete_ra within blocks independently B) assign the remaining 1 unit within each block with EQUAL probability to each condition, but implement “load balancing” so that it’s never the case that both remaining units gets assigned (for example) to T1.

We might think that there would be an option C) in which we respect the prob_each within each block, and also load balance, but that doesn’t work: if in the first block, we assign the remaining unit to T3 with 70% probability, that means that the remaining unit in the second block is assigned to T3 with 30% probability, i.e. exactly 1 - p of the correct probability.  This problem is compounded if there is a zero probability in some condition.

Solution B) has a problem too with zero probabilities.  if the remainders are assigned with equal probability, it can happen that, within a block that is supposed to have no units assigned to T3, 1/3rd of the time there WILL be such an assignment.

For this reason, I think I’ve come down to preferring solution A.  In that case, if users want to implement some version of load balancing across blocks, they’ll have to specify a custom function (and figure out the analytic probabilities of assignment themselves…)

Way forward

I think that we should probably implement a new function balanced_block_ra or similar that handles the two arm case.

There may also be special cases of the multi arm trials that we could accomodate

Attachments

I'm attaching a pdf of a write-up of the problem that Macartan circulated a while ago.

Systematic load balancing block randomization for arbitrary individual assignment probabilities.pdf

acoppock commented 6 years ago

Here's another restatement of the problem that I think underlines the difficulty of simultaneously holding the total number of treatments allocated fixed while maintaining nominal probabilities of assignment.

Imagine two blocks, A and B. A has two units and B has three units. You want to block such that "half" of the units are treated.

In block A, we'll always assign one to treatment and one to control. In block B, we can assign one to treatment or two to treatment.

We have two options. 1) hold the total number of treatments fixed, say at 2, which means that units in block A have prob = 0.5 of treatment and units in block B have prob = 1/3 of treatment. 2) maintain nominal probabilities of assignment by randomizing (50/50) whether we treat 1 or 2 in block B. Then all units have prob = 0.5 of treatmetn

Neither option satisfies both goals of keeping total number of treatments fixed and nominal probabilities of assignment.

I think we'll need to restrict the set of cases in which we can execute load balancing, even among two arm trials. I think that if we have an even number of blocks that are all of the same size, then we might be able to make some guarantees? I'm not sure.