Closed slowbull closed 7 years ago
well to start we have different input sizes. His is 42x42 and mine is 80x80. his model is exact replica of universe starter agent. That model is good but obviously very fine tuned for Pong specifically. Im using a 4 layers conv2d model with 32 filters of size 5 × 5, 32 filters of size 5 × 5, 64 filters of size 4 × 4, and 32 filters of size 3 × 3 with single strides for all and max pooling on each. Im also using a 512 LSTM Cell as opposed to 256 last cell. Also have RMSprop shared optimizer implemented. My model obviously larger so slower to train but more robust and much higher final performance as designed for the tough gym v0 environments
Thanks ! In your experiment, does RMSprop shared optimizer works better than Adam?
They are actually quite different considering both A3C LSTM obviously
I fine tuned the Adam more so been using that to train but with some tinkering on RMSprop it should give similar results from the few times I played with it. The Adam epsilon default was must change. Big improvement from just that
Thanks for your quick reply!
They both show benefit of being more robust and steadying factor to learning compared to non shared
@dgriff777 @ppwwyyxx Why did increasing Adam epsilon from 1e-8 to 1e-3 help? Purpose of epsilon is to prevent division by zero by adding it to denominator. 1e-8 is already large enough to prevent 0 division (I think), so changing to 1e-3 would just add more arbitrary bias.
The default epsilon for Adam is often not best choice in my experience. As to why in this case it works better several things could be of cause but its hyper parameter searching which always has a fuzzy factor.
How long does it take to train Pong-v0? I used 16 threads, and after 7 hours, episode reward is about 10, far slower and worse than the original network.
Well as I said before the universal starter agent/ikostrikov/pytorch-a3c is highly optimized for Pong. Also that model uses 42x42 input while mine is 80x80 which means more data to crunch and its also larger more robust model so that it can perform well on all games in Atari not just Pong which is also quite simple. For Pong-v0 its gonna take about 6-7hrs to start scoring 21pts as opposed to other model which is around 2hrs I believe but my model has better overall performance limit
thats for 32 threads. Have not trained it on 16threads but rough estimate would be around 10hrs for 16threads at most I believe
in contrast though in Breakout-v0 its scoring over 400 in 4-5hrs which is far faster than other model on 32 threads
After about 8 hours, I got expected results on Pong-v0. Thanks!
You welcome :)
As far as I can see, model hyperparameters are different. Thanks.