Chapter 5 bandits strategies

CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

MIT License

26.55k stars 7.85k forks source link

I think that there are 2 problems with the strategies adopted in chapter 5:

1) The text suggests that strategies 4 and 5 are different and will be applied. The strategy max_mean refers to which of them? I think that the other strategies applied (upper_credible_choice, bayesian_bandit_choice, ucb_bayes, random_choice) are not 4 neither 5. 2) The strategy "max_mean" always chooses the bandit 0. All bandits are initialized with 0 wins and argmax in this case will return 0 so you will choose bandit 0. You probably will win with bandit 0 with time and with a win rate higher than 0 you will always choose the bandit 0.

CamDavidsonPilon / Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Chapter 5 bandits strategies #398