evg-tyurin / alpha-nagibator

Implementation of self-play based reinforcement learning for Checkers based on the AlphaGo Zero methods.
18 stars 4 forks source link

Randomness in Arena #4

Open bhansconnect opened 5 years ago

bhansconnect commented 5 years ago

Quick questions about Arena comparison of networks.

If the temperature is set to 0 and there is no Dirichlet noise added in during Arena games, what stops the neural networks from playing the same game over and over again? Where does the randomness come in that makes every game unique?

Is the game only random when the MCTS has 2 moves with the same count? I feel like that would be pretty rare. I am just trying to understand how you compare networks and make things random/unique. Any comments would be greatly appreciated.

evg-tyurin commented 5 years ago

Yes, randomness occurs in two different places in the code. When I run competitions in Arena I observe zero to 15-20 identical games or even more depending on unknown conditions. I think that more identical games occur if network is overfitted.

bhansconnect commented 5 years ago

Can you point out those 2 locations for me?

evg-tyurin commented 5 years ago

https://github.com/evg-tyurin/alpha-nagibator/blob/48b2ebd3ca272f388c13277297edbb60d98eb64b/MCTS.py#L75

https://github.com/evg-tyurin/alpha-nagibator/blob/48b2ebd3ca272f388c13277297edbb60d98eb64b/MCTS.py#L225