HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
282
stars
39
forks
source link
(Idea) feature: proportional accept rate during all phases #324
So far, the adoption rate in the replay buffer has been linear based on maximum_episodes, but this means that the earliest episodes will be selected many times before the buffer is filled.
Even if the diversity in each batch will be decreased a little, it would be better to use a weight proportional to the number of current episodes so that the earliest episodes are less likely to be selected.
So far, the adoption rate in the replay buffer has been linear based on
maximum_episodes
, but this means that the earliest episodes will be selected many times before the buffer is filled.Even if the diversity in each batch will be decreased a little, it would be better to use a weight proportional to the number of current episodes so that the earliest episodes are less likely to be selected.