Confusion around Trials & Thompson Sampling

AustinGomez commented 1 year ago

Hi! Quick question around the usage of Trials and how they interact during a bandit experiment using Thompson sampling. I'm trying to use AX to run field experiments, hence I'm generating trials at some cadence, deploying the trials to an external system, then evaluating the results & generating a new trial etc etc.

I've dug through the source code and ran some tests and it seems that the Thompson sampling model has no "memory" between trials, is this correct? My expectation was that the model would maintain and iteratively update posterior distributions but that doesn't seem to be the case, rather it seems to construct the sampling distributions from scratch on each trial.

A few questions

Is there some benefit to this method that I'm missing? Mitigating non-stationarity effects comes to mind.
If there isn't a benefit, are we intended to implement own bayesian update across trials?

eytan commented 1 year ago

Yes, the reason we default to using only a single batch is because of concerns of non-stationarity. You can always use your own model that aggregates across multiple batches, though.

On Wed, Oct 19, 2022 at 3:38 PM AustinGomez @.***> wrote:

Hi! Quick question around the usage of Trials and how they interact during a bandit experiment using Thompson sampling. I'm trying to use AX to run field experiments, hence I'm generating trials at some cadence, deploying the trials to an external system, then evaluating the results & generating a new trial etc etc.

I've dug through the source code and ran some tests and it seems that the Thompson sampling model has no "memory" between trials, is this correct? My expectation was that the model would maintain and iteratively update posterior distributions but that doesn't seem to be the case, rather it seems to construct the sampling distributions from scratch on each trial.

A few questions

Is there some benefit to this method that I'm missing? Mitigating non-stationarity effects comes to mind.

If there isn't a benefit, are we intended to implement own bayesian update across trials?

— Reply to this email directly, view it on GitHub https://github.com/facebook/Ax/issues/1219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34NA3Q5HDNJLECG7QZLWEBEZXANCNFSM6AAAAAARJOGE3E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AustinGomez commented 1 year ago

Thanks for the quick reply, are there any examples already in AX of models that do this? If not do you have any recommendations about how that might be implemented? I would have thought that aggregation would be implemented at the metric level rather than the model level

eytan commented 1 year ago

@qingfeng10, perhaps you could provide pointers when you have a chance (we are both at a conference now, so apologies for any brief responses and delays).

The idea is that you'd get all data for all batches, and then combine the sufficient statistics by using the standard equations for pooling the means and variances of independent random variables. You can then plug these statistics into an empirical bayes model and do TS.

AustinGomez commented 1 year ago

Hi. Checking back in here, as far as I can tell there are 2 options: either do the statistic pooling in a Metric subclass then feed the outcome to the default EBTS model, or pool the metrics within a custom model.

Unless you see any reason to go with one or the other we're leaning towards the latter. Thanks! (And feel free to close)

qingfeng10 commented 1 year ago

Apologize for getting back late @AustinGomez! Added a bit more info to the option 1, in case you need. You can apply that directly to the dataframe of Data.df i.e. do the transformation needed to correct non-stationarity across trials and then combine data for all batches. After that, you can just call

m = Models.EMPIRICAL_BAYES_THOMPSON(
    data=data, 
    experiment=experiment, 
    status_quo_features=ObservationFeatures(
        parameters=exp.status_quo.parameters, trial_index=target_index  # e.g. the latest trial index
    ),
)

One more thing to call out. When you call Models.EMPIRICAL_BAYES_THOMPSON, the code by default will apply inverse variance weighting to merge metric values across multiple test groups. So you do not need to worry about that.

Gonna close this for now. Free free to open if you have more questions!

AustinGomez commented 1 year ago

Hi @qingfeng10! Thanks for the info.

One more thing to call out. When you call Models.EMPIRICAL_BAYES_THOMPSON, the code by default will apply inverse variance weighting to merge metric values across multiple test groups. So you do not need to worry about that.

Would you be able to expand on this? As far as I can tell the only difference between THOMPSON and EMPIRICAL_BAYES_THOMPSON is that the EBTS model applies shrinkage across groups in the same trial. NOT across trials. Is my understanding correct?

Also it seems they're both instantiated with the same Transformations, both of which include IVW (which confuses me even more!)

It seems like the transformation isn't being applied? For more context, if I try to give more than 1 trial's worth of data to the EBTS model via run = gs.gen(exp, n=-1, data=exp.fetch_data()) I get the following error:

ThompsonSampler requires all rows of X to be unique; i.e. that there is only one observation per parameterization.

facebook / Ax

Confusion around Trials & Thompson Sampling #1219