Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Why didn't you reset the noise of target net? #2

Closed Halbmond closed 6 years ago

Kaixhin commented 6 years ago

Sorry, may have missed something elsewhere but the noise should be reset at the start of every episode here: https://github.com/Kaixhin/Rainbow/blob/master/main.py#L85 ?

Halbmond commented 6 years ago

https://arxiv.org/pdf/1706.10295.pdf According to the page 12 of this paper,

  1. the noise of policy_net is reset here, https://github.com/Kaixhin/Rainbow/blob/master/agent.py#L34 but the noise of target_net should also be reset.
  2. the noise should be reset before every gradient step, instead of the start of every episode.
Kaixhin commented 6 years ago

Thanks for spotting - I'd worked on the A3C version previously, so missed that the DQN version is a bit different beyond the factorised noise. Should be fixed now.