Help on ten_armed_testbed.py

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

MIT License

13.45k stars 4.81k forks source link

Closed ai4pharma closed 4 years ago

ai4pharma commented 4 years ago

Why there are still np.random.choice in Line 66/74? It should be a definite number from argmax. Thanks.

ShangtongZhang commented 4 years ago

There could be a tie.

ai4pharma commented 4 years ago

As shown below, Line 74 always choose the first element by [0] when there is a tie. The input to np.random.choice is an int instead of an array.

np.random.choice(np.where(self.q_estimation == q_best)[0])

ShangtongZhang commented 4 years ago

You can set a breakpoint and run the code to see what happens.