DQN: Optimizer - Githubissues

Kautenja / playing-mario-with-deep-reinforcement-learning

An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.

MIT License

69 stars 12 forks source link

DQN: Optimizer #10

Closed Kautenja closed 6 years ago

Kautenja commented 6 years ago

[x] Adam
[x] Nadam
[x] RMSprop

Kautenja commented 6 years ago

RMSprop with default DeepMind parameters is complete garbage. After 5,000,000 frames it raised the average score per episode to only -15 for Pong. For reference, Adam can converge to nearly perfect games (average score of +16) in the same amount of time. Long story short, RMSprop in Keras is either different from what they used, or Adam is just plain better. No more exploration will be done with RMSprop.

EDIT: fix some spelling, grammar, etc.

Kautenja commented 6 years ago

Nadam and Adam produce similar results. Nadam seems to take a small amount of extra time. Adam will be used from here on out. Notebooks are searching for a solid learning rate to lock for remaining experiments.

EDIT: Nadam just achieved a high average score of 18.1. Rethinking this with more notebooks

Kautenja commented 6 years ago

high learning rates seem to cause an explosion of gradients in the early stages. (i.e. 1e-4, 1e-3, 2e-3, etc.). something stable like 2e-5 might be the best learning rate

Kautenja commented 6 years ago

Further experiments confirm that Adam running at 1e-4 produces unstable results. 2e-5 will be in place from here on out.