eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
553 stars 149 forks source link

`Temperature` variable in Monte Carlo Tree Search #83

Closed AndreasKaratzas closed 2 years ago

AndreasKaratzas commented 2 years ago

What is the physical meaning of the temperature variable? I know that it is the exploration parameter. My question is about it's initialization. In other words, what does 200 mean in the default .json configuration file? I understand that it means that it means that the agent has still mush to explore. It's just what is the quantization rule here? I mean is there a theoretical max value? Any input on the matter besides my question would be helpful to better understand the physical purpose of the temperature hyperparameter. Thanks :)

eleurent commented 2 years ago

Indeed, it is a parameter (often denoted c) which controls the tradeoff between exploration (sampling an action that hasnt been played a lot before) and exploitation (sampling the action with the highest current empirical value estimate). There is probably a theoretical value that guarantees asymptotic convergence (just like there is one for the UCB algorithm), but these are typically quite conservative, and in practice they are often tuned manually to get better performance. But let me try to give you an intuition: we sample the action that has the highest value_estimate + temperature * prior_probability / visits

Does that make sense?

AndreasKaratzas commented 2 years ago

Yes, perfectly :) Thank you for the thorough explanation!