dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
563 stars 119 forks source link

Can you please list out the difference between your code and ikostrikov/pytorch-a3c #2

Closed slowbull closed 7 years ago

slowbull commented 7 years ago

As far as I can see, model hyperparameters are different. Thanks.

dgriff777 commented 7 years ago

well to start we have different input sizes. His is 42x42 and mine is 80x80. his model is exact replica of universe starter agent. That model is good but obviously very fine tuned for Pong specifically. Im using a 4 layers conv2d model with 32 filters of size 5 × 5, 32 filters of size 5 × 5, 64 filters of size 4 × 4, and 32 filters of size 3 × 3 with single strides for all and max pooling on each. Im also using a 512 LSTM Cell as opposed to 256 last cell. Also have RMSprop shared optimizer implemented. My model obviously larger so slower to train but more robust and much higher final performance as designed for the tough gym v0 environments

slowbull commented 7 years ago

Thanks ! In your experiment, does RMSprop shared optimizer works better than Adam?

dgriff777 commented 7 years ago

They are actually quite different considering both A3C LSTM obviously

dgriff777 commented 7 years ago

I fine tuned the Adam more so been using that to train but with some tinkering on RMSprop it should give similar results from the few times I played with it. The Adam epsilon default was must change. Big improvement from just that

slowbull commented 7 years ago

Thanks for your quick reply!

dgriff777 commented 7 years ago

They both show benefit of being more robust and steadying factor to learning compared to non shared

ethancaballero commented 7 years ago

@dgriff777 @ppwwyyxx Why did increasing Adam epsilon from 1e-8 to 1e-3 help? Purpose of epsilon is to prevent division by zero by adding it to denominator. 1e-8 is already large enough to prevent 0 division (I think), so changing to 1e-3 would just add more arbitrary bias.

dgriff777 commented 7 years ago

The default epsilon for Adam is often not best choice in my experience. As to why in this case it works better several things could be of cause but its hyper parameter searching which always has a fuzzy factor.

slowbull commented 7 years ago

How long does it take to train Pong-v0? I used 16 threads, and after 7 hours, episode reward is about 10, far slower and worse than the original network.

dgriff777 commented 7 years ago

Well as I said before the universal starter agent/ikostrikov/pytorch-a3c is highly optimized for Pong. Also that model uses 42x42 input while mine is 80x80 which means more data to crunch and its also larger more robust model so that it can perform well on all games in Atari not just Pong which is also quite simple. For Pong-v0 its gonna take about 6-7hrs to start scoring 21pts as opposed to other model which is around 2hrs I believe but my model has better overall performance limit

dgriff777 commented 7 years ago

thats for 32 threads. Have not trained it on 16threads but rough estimate would be around 10hrs for 16threads at most I believe

dgriff777 commented 7 years ago

in contrast though in Breakout-v0 its scoring over 400 in 4-5hrs which is far faster than other model on 32 threads

slowbull commented 7 years ago

After about 8 hours, I got expected results on Pong-v0. Thanks!

dgriff777 commented 7 years ago

You welcome :)