DeclareDesign / randomizr

randomizr: Easy-to-Use Tools for Common Forms of Random Assignment and Sampling
https://declaredesign.org/r/randomizr
Other
36 stars 9 forks source link

Assigning block probs after sampling #71

Closed macartan closed 2 years ago

macartan commented 6 years ago

Dealing with a situation where blocks may be empty (this could happen with some sampling, though in current application because blocks are based on XY combinations some of which may be empty).

Can we link block probs to blocks using factors? Or perhaps have block_probs calculated from a column of unit level probs.

#OK
blocks = factor(rep(4:1, each = 2))
block_ra(blocks, block_prob = c(1:4)/4)

# Not OK
blocks = blocks[3:8]
block_ra(blocks, block_prob = c(1:4)/4)
acoppock commented 6 years ago

Can you adjust your code so that block_prob is always the correct length?

macartan commented 6 years ago

this would mean figuring out which blocks are present and then subsetting the probability vector; certainly possible but not I think preferable to being able to specify the probabilities directly by name

On Tue, Jun 19, 2018 at 4:23 PM Alexander Coppock notifications@github.com wrote:

Can you adjust your code so that block_prob is always the correct length?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/randomizr/issues/71#issuecomment-398417260, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_TmJz_wpDvNu0-WHBfNNOq125QgOks5t-QlmgaJpZM4Utfi2 .

acoppock commented 6 years ago

Yes, I think that approach is safer than relying on factors to know which blocks have no units in them.

macartan commented 6 years ago

imagine there are a few ways to do it, but would be good as part of block_ra and not left to users to figure this out not sure why using factors not safe, but using names or block probs could be another approach if there were a fix like the below inside the functon (when names exist)

block_prob = c(A = .4, B = .3, C = .3, D = .2)> > blocks = c("A", "B", "B", "A")> > block_prob <- block_prob[names(block_prob) %in% blocks]> > block_prob A B 0.4 0.3

i

On Tue, Jun 19, 2018 at 4:58 PM Alexander Coppock notifications@github.com wrote:

Yes, I think that approach is safer than relying on factors to know which blocks have no units in them.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/randomizr/issues/71#issuecomment-398430247, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_SbQ0-9yV_D4SiB0LOcQj3mttfh-ks5t-RF9gaJpZM4Utfi2 .

acoppock commented 6 years ago

I'm not sure what you mean by "left to users to figure out" -- they pass a vector of blocks and provide parameters in the order of those blocks. I don't think it's crazy to not support users who pass parameter information for blocks that don't exist in the vector of blocks passed to block_ra

macartan commented 6 years ago

I think it's not a lot for users to want to want to be able to give a direct mapping between block ids and block probabilities without having to rely on a sort order in a possibly changing set. Will suggest some code.

On Wed, Jun 20, 2018 at 7:32 PM Alexander Coppock notifications@github.com wrote:

I'm not sure what you mean by "left to users to figure out" -- they pass a vector of blocks and provide parameters in the order of those blocks. I don't think it's crazy to not support users who pass parameter information for blocks that don't exist in the vector of blocks passed to block_ra

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/randomizr/issues/71#issuecomment-398833683, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_Tn8fXBhxZCtZD32u65NS-w9_Y-0ks5t-ocmgaJpZM4Utfi2 .

acoppock commented 6 years ago

Ok thats great. whatever the solution, it should not break assigning by order

acoppock commented 5 years ago

I think that the new prob_unit and m_unit arguments will address this problem. Those will be vectors of length N, so if you subset your data, then the parameters won't need to be updated. Or maybe I'm mis understanding?