facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.38k stars 312 forks source link

Contextual Bandit Examples for Thompson Sampling and Gaussian Optimization #242

Closed gupchup closed 4 years ago

gupchup commented 4 years ago

I heard the excellent talk at the ML for systems workshop at Neurips last year on the Ax system. I enjoyed it very much.

I am working on similar optimization problems and interested in comparing the Ax platform with other open source solutions out there. The problems I am dealing with are contextual bandit in nature – similar to the one you have written about in your paper and mentioned in your talk [1]. While I found several neat examples of the multi-armed settings [2, 3], I was unable to find any examples or documentation on the Ax site dealing with the context-based Thompson sampling or context-based Bayesian optimization as mentioned in [1].

It's possible that I've missed it. If that is the case, would you mind pointing me to the contextual bandit setting described in your talk and paper.

Thank you!

(1): https://arxiv.org/pdf/1911.00638.pdf (2): https://www.ax.dev/files/factorial.ipynb (3): https://www.ax.dev/files/gpei_hartmann_loop.ipynb

sdaulton commented 4 years ago

Hi @gupchup, the contextual bandit work described in [1] is currently not implemented in Ax (although we used Ax for the reward shaping and parameter tuning in our video transcoding experiment), and it is not currently open sourced. It would be great to support contextual bandits down the road, but we currently do not have concrete plans to implement [1] in Ax in the near future.

gupchup commented 4 years ago

Hi @sdaulton, Thank you for the clarification and the prompt response. Looking forward to contextual bandits support whenever it is made available.

sdsingh commented 4 years ago

Although we do not have plans to implement [1] in the near future, we have internally used and do plan to open source context-based Bayesian Optimization in Ax. I will update you by the end of February on our work there!

wooohoooo commented 4 years ago

Hi, I was wondering if there are any updates in this by any chance, or a timeline for one?

Thanks!

sdsingh commented 4 years ago

@wooohoooo, what are you particularly interested in here? With regards to contextual Bayesian Optimization, please refer to https://github.com/facebook/Ax/issues/268.

With regards to Contextual Bandit Optimization, this will remain outside of the scope of Ax.

wooohoooo commented 4 years ago

@sdsingh thank you for your reply. I wasn't aware that Contextual Bandit Optimization is outside of Ax's scope. I know it's a long shot, but do you have a recommendation for a library that implements this, maybe one that works well with Ax?

sdsingh commented 4 years ago

I've done work based on Carlos Riquelme's project (ICLR 2018): https://github.com/tensorflow/models/tree/master/research/deep_contextual_bandits

Incorporating with Ax should not be too difficult, we've used Ax in the past to tune reward functions in this library.