HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
282
stars
42
forks
source link
feature: apply omask for two-player value averaging for solo-play #340
I would like to avoid strange behavior in the target-value-averaging setting in situations as when generating matches that are not self-playing with a certain probability.
I would like to avoid strange behavior in the target-value-averaging setting in situations as when generating matches that are not self-playing with a certain probability.