StepNeverStop / RLs

Reinforcement Learning Algorithms Based on PyTorch
https://stepneverstop.github.io
Apache License 2.0
449 stars 93 forks source link

Added "Exploration and exploitation compromise" #11

Closed kmakeev closed 5 years ago

kmakeev commented 5 years ago

Added "Exploration and exploitation compromise", test for him. Used in dqn model. if accepted, can be applied to all models and used in evaluation.

Example: --gym -a dqn -g -n train_using_gym --gym-env Acrobot-v1 --render-episode 10 --max-step 500 --gym-agents 4

Episode: 128 | step: 159 | last_done_step 159 | rewards: [-121. -115. -110. -158.]

Episode: 129 | step: 144 | last_done_step 144 | rewards: [-103. -134. -82. -143.] Evaluate episode: 129 evaluate number: 100 | average step: 90 | average reward: -89.33 | SOLVED: True

Episode: 130 | step: 110 | last_done_step 110 | rewards: [ -91. -104. -109. -104.]

Process finished with exit code 0

StepNeverStop commented 5 years ago

Hi, @kasimte

I've checked your PR, it seems you added a decay version of epsilon for off-policy algorithms, and I have some suggestions:

  1. actually, you don't need to pass episode parameter for choose_action function. just use self.episode is OK. Because self.epiosde will be updated when invoking learn function.

  2. I'm not sure about if we should add exploration in inference mode, or just exploitation. I think we should select the action with complete certainty in inference mode.

thank you for your great help.

I‘m gonna merge this and modify it a little bit later(maybe tonight).