MushroomRL / mushroom-rl

Python library for Reinforcement Learning.
MIT License
803 stars 145 forks source link

how to reproduce DQN nature paper? #115

Closed davidenitti closed 1 year ago

davidenitti commented 1 year ago

I'm trying to reproduce the results on breakout with DQN with mushroom, but I get much lower average rewards. I expect to reach at least 300 as in the nature paper on DQN, but I reach 175 after 100 epochs. I started from the example code where the parameters are already close to the nature paper, I only increased the replay memory to 1M, and tested different optimizers, but with no luck. do you have any idea what I'm missing?

carloderamo commented 1 year ago

The most sensitive aspect of DQN experiments is the optimizer. Our implementation uses Pytorch, while the original implementation of DQN uses an old version of Tensorflow. It could be that small differences in the implementation of the optimizer results in big differences in performance. Another thing to look at is the environment. Are you using the deterministic or stochastic version?

davidenitti commented 1 year ago

I check on some pages that the env I should use is BreakoutNoFrameskip-v4 and not BreakoutDeterministic-v4 to reproduce the paper. is this correct? indeed the reward is much higher with BreakoutNoFrameskip-v4

carloderamo commented 1 year ago

This is correct. The non-deterministic version is the one used in the Nature paper.

davidenitti commented 1 year ago

@carloderamo is there any other trick to consider? I know that nature paper does clip of the reward and other tricks. is this already implemented somewhere? if so where? thanks!

carloderamo commented 1 year ago

All the other tricks are implemented. I cannot think about something else missing.

davidenitti commented 1 year ago

thanks, just the last question, is the clip of the reward implemented in the agent or in the env?

carloderamo commented 1 year ago

It is in the DQN agent. There is a flag to enable it.