feature: apply omask for two-player value averaging for solo-play

DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

MIT License

282 stars 42 forks source link

feature: apply omask for two-player value averaging for solo-play #340

Closed YuriCat closed 1 year ago

YuriCat commented 1 year ago

I would like to avoid strange behavior in the target-value-averaging setting in situations as when generating matches that are not self-playing with a certain probability.

ikki407 commented 1 year ago

I see. This looks good for the actual situation.