Open htcml opened 4 years ago
I can implement one of the exponential weighted methods like exp3 or a variant. There can also be simple modifications of existing algorithms like sliding window UCB or any of the adversarial strategies. I think the main part of the implementation will be the environment.
Just point this out for your reference. Daniel Russo outlines his approach in p. 43 section 6.3 non-stationary system of this tutorial: https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf
EXP3
is currently implemented in the ordinary multi-armed bandit.
Can you add MAB algorithms which can handle time-varying signals? Maybe non-stationary MAB algorithms are for this purpose?