aimclub / GOLEM

Graph Optimiser for Learning and Evolution of Models
https://thegolem.readthedocs.io
BSD 3-Clause "New" or "Revised" License
63 stars 7 forks source link

Make reward for bandits more stable #147

Closed maypink closed 1 year ago

maypink commented 1 year ago

Now the reward for bandits is not stable since in the very beggining of optimization the given rewards for actions are much bigger than in the end of optimization. Therefore reward must be calculated with the use of sliding window and other tricks like in this article.