HuasenWu / DuelingBandits

Simulations for Dueling Bandit Algorithms, including our Double Thompson Sampling (D-TS) algorithms
25 stars 3 forks source link

DuelingBandits

Simulations for Dueling Bandit Algorithms.

Used in our NIPS'16 paper: Huasen Wu and Xin Liu, “Double Thompson Sampling for Dueling Bandits”, Conference on Neural Information Processing Systems (NIPS), 2016.

This file contains a summary of what you will find in each of the files that make up your DuelingBandit application.

Created in MS Visual Studio 2013

DuelingBandit.h
Head file for the main application.

DuelingBandit.cpp
This is the main application source file.

class CMAB (MAB.h + MAB.cpp):
The class simulating the dueling bandit.

class CAlg (Alg.h + Alg.cpp):
The base class for all dueling bandit algorithms.
All other algorithms are inherited from this class.

Main References

[1] Huasen Wu and Xin Liu, “Double Thompson Sampling for Dueling Bandits”, In Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), 2016.
[2] M. Zoghi, S. A. Whiteson, M. De Rijke, and R. Munos. "Relative confidence sampling for efficient on-line ranker evaluation", In Proceedings of the 7th ACM international conference on Web search and data mining, pages 73–82. ACM, 2014.
[3] M. Zoghi, S. Whiteson, R. Munos, and M. D. Rijke. "Relative upper confidence bound for the k-armed dueling bandit problem". In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 10–18, 2014.
[4] J. Komiyama, J. Honda, H. Kashima, and H. Nakagawa. Regret lower bound and optimal algorithm in dueling bandit problem. In Proceedings of Conference on Learning Theory, 2015.
[5] M. Zoghi, Z. S. Karnin, S. Whiteson, and M. de Rijke. Copeland dueling bandits. In Advances in Neural Information Processing Systems, pages 307–315, 2015.
[6] J. Komiyama, J. Honda, and H. Nakagawa. Copeland dueling bandit problem: Regret lower bound, optimal algorithm, and computationally efficient algorithm. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16).

Acknowledgement

Thanks to Masrour Zoghi (University of Amsterdam) and Dr. Junpei Komiyama (University of Tokyo) for their helpful discussion in the simulations!