Multi-armed bandit? - Githubissues

kevinmickey commented 8 years ago

Is it possible to use a multi-armed bandit algorithm? (Like http://stevehanov.ca/blog/index.php?id=132)

eytan commented 8 years ago

Yes, most definitely. The easiest thing to do is batch-based Thompson sampling, where you start with an experimental design that looks like:

x <- uniformChoice(choices=['a','b','c','d'], unit=userid);

Observe some data, then use Thompson sampling to generate a distribution of winners, and use that as weights for a new batch. For example, you can make multiple draws from the beta posterior over arms, and mark how many times each arm is the winner. Then you just use that tabulation as the weights for the new batch, yielding something that might look like:

x <- weightedChoice(choices=['a','b','c','d'], weights=[10,100,200,690], unit=userid);

You would then repeat this process once a day or a few times per day, perhaps using namespaces to manage the experiment.

With the above method you directly represent policies as PlanOut scripts, but you could also use an external service to store / manage the policies. The latest version of the PlanOut reference implementation makes it easy to add your own operators if you wanted to do something like this.

HTH.

kevinmickey commented 8 years ago

Thanks! I think your batched approach makes sense, particularly for scalability where logs are big. I'm thinking of writing an operator that continuously re-calculates the weights. (I suppose with an external service, you could store the previous weights....) Would this be a reasonable PR, if other people find this potentially useful?

eytan commented 7 years ago

Hi @kevinmickey --- apologies for letting this fall off the map. It would be great to have such functionality in contrib/, but since it requires an external service I would not want to include it in the core reference implementation. In case you are developing a custom operator, we do have a mechanism for doing that without needing to modify the package itself (see https://github.com/facebook/planout/blob/master/python/planout/test/test_interpreter.py#L41). This offer might be a little too late, but I'd be happy to review / provide feedback on any bandit-related things involving PlanOut

javidjamae commented 4 years ago

Based on how the WeightedChoice is implemented, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic.

If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again.

It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment.

I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments?

eytan commented 4 years ago

Hi Javid,

The assignment should is deterministic as long as your input IDs and experimental design (the planout scripts) remain the same. It’s assumed that you won’t change the experiment while it’s running. Changing experiments while are running are a huge source of error in practice, and we recommend using namespaces to manage changes. See S5 of https://hci.stanford.edu/publications/2014/planout/planout-www2014.pdf for details.

E

On Sun, Sep 13, 2020 at 10:20 AM Javid Jamae notifications@github.com wrote:

Based on how the WeightedChoice is implemented https://github.com/facebook/planout/blob/d2f0088c905bdf5a250337019d1ee1f1c0067b5e/alpha/ruby/lib/plan_out/op_random.rb#L53, it seems that if you redistribute the weights, the variation that is served across multiple requests is no longer deterministic.

If a user with a particular id/hash is assigned one variation with a given set of weights and those weights change, that same user might subsequently get assigned to a different variation if you ask for their assignment again.

It seems like this could be particularly problematic for any experiment that doesn't conclude quickly or on a single page, like a multi-page funnel where different pages need to re-request the assignment.

I guess it would put the onus on the caller to cache the assignment and not be able to rely on the library to return assignments?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/facebook/planout/issues/108#issuecomment-691698909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34J2ILVLJNM6KH7RLN3SFT5NXANCNFSM4CGPK7FA .

Amitg1 commented 3 years ago

Saw this 4 years late. but, Seems that they managed to do it here: https://engineering.ezcater.com/multi-armed-bandit-experimentation

facebookarchive / planout

Multi-armed bandit? #108