Closed gupchup closed 4 years ago
Hi @gupchup, the contextual bandit work described in [1] is currently not implemented in Ax (although we used Ax for the reward shaping and parameter tuning in our video transcoding experiment), and it is not currently open sourced. It would be great to support contextual bandits down the road, but we currently do not have concrete plans to implement [1] in Ax in the near future.
Hi @sdaulton, Thank you for the clarification and the prompt response. Looking forward to contextual bandits support whenever it is made available.
Although we do not have plans to implement [1] in the near future, we have internally used and do plan to open source context-based Bayesian Optimization in Ax. I will update you by the end of February on our work there!
Hi, I was wondering if there are any updates in this by any chance, or a timeline for one?
Thanks!
@wooohoooo, what are you particularly interested in here? With regards to contextual Bayesian Optimization, please refer to https://github.com/facebook/Ax/issues/268.
With regards to Contextual Bandit Optimization, this will remain outside of the scope of Ax.
@sdsingh thank you for your reply. I wasn't aware that Contextual Bandit Optimization is outside of Ax's scope. I know it's a long shot, but do you have a recommendation for a library that implements this, maybe one that works well with Ax?
I've done work based on Carlos Riquelme's project (ICLR 2018): https://github.com/tensorflow/models/tree/master/research/deep_contextual_bandits
Incorporating with Ax should not be too difficult, we've used Ax in the past to tune reward functions in this library.
I heard the excellent talk at the ML for systems workshop at Neurips last year on the Ax system. I enjoyed it very much.
I am working on similar optimization problems and interested in comparing the Ax platform with other open source solutions out there. The problems I am dealing with are contextual bandit in nature – similar to the one you have written about in your paper and mentioned in your talk [1]. While I found several neat examples of the multi-armed settings [2, 3], I was unable to find any examples or documentation on the Ax site dealing with the context-based Thompson sampling or context-based Bayesian optimization as mentioned in [1].
It's possible that I've missed it. If that is the case, would you mind pointing me to the contextual bandit setting described in your talk and paper.
Thank you!
(1): https://arxiv.org/pdf/1911.00638.pdf (2): https://www.ax.dev/files/factorial.ipynb (3): https://www.ax.dev/files/gpei_hartmann_loop.ipynb