DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
282 stars 42 forks source link

feature: compute rho, c by joint probability #322

Open YuriCat opened 2 years ago

YuriCat commented 2 years ago

How to define rho and c has not a clear answer.

However, in a game like rock-paper-scissors where the best move depends on the opponent's move, it makes no sense to compare only the probability of one's own move.