(Idea) feature: proportional accept rate during all phases

DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

MIT License

282 stars 39 forks source link

(Idea) feature: proportional accept rate during all phases #324

Closed YuriCat closed 1 year ago

YuriCat commented 1 year ago

So far, the adoption rate in the replay buffer has been linear based on maximum_episodes, but this means that the earliest episodes will be selected many times before the buffer is filled.

Even if the diversity in each batch will be decreased a little, it would be better to use a weight proportional to the number of current episodes so that the earliest episodes are less likely to be selected.

ikki407 commented 1 year ago

I agree with this change.