Added "Exploration and exploitation compromise"

StepNeverStop / RLs

Reinforcement Learning Algorithms Based on PyTorch

Apache License 2.0

449 stars 93 forks source link

Hi, @kasimte

I've checked your PR, it seems you added a decay version of epsilon for off-policy algorithms, and I have some suggestions：

actually, you don't need to pass episode parameter for choose_action function. just use self.episode is OK. Because self.epiosde will be updated when invoking learn function.
I'm not sure about if we should add exploration in inference mode, or just exploitation. I think we should select the action with complete certainty in inference mode.

thank you for your great help.

I‘m gonna merge this and modify it a little bit later(maybe tonight).

StepNeverStop / RLs