Using xavier initialization on advantage/value weights improves model performance

awjuliani / DeepRL-Agents

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

MIT License

2.23k stars 825 forks source link

Using xavier initialization on advantage/value weights improves model performance #32

Closed wmitsuda closed 7 years ago

wmitsuda commented 7 years ago

I just found out in my tests that changing the weights initialization from random_normal to xavier initialization improves the training process a lot.

Using only CPU, the original code takes about 3.5K episodes to reach the reward ~ 22, which is around the maximum reward I was able to obtain reproducing the code.

By using xavier initialization, the code quickly converges to the same result by episode 1K, taking < 30 minutes in my macbook pro using only CPU.

awjuliani commented 7 years ago

Great find. Thanks for sending the pull request!