Why is pytorch-a3c implementation so much faster?

NVlabs / GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.

BSD 3-Clause "New" or "Revised" License

652 stars 195 forks source link

Why is pytorch-a3c implementation so much faster? #15

Closed lolz0r closed 7 years ago

lolz0r commented 7 years ago

https://github.com/ikostrikov/pytorch-a3c has an implementation (CPU ONLY) that can converge PongDeterministic-v3 within 15 minutes while the GPU powered GA3C appears to take 2-3 hours to achieve the same?

Based on my (limited) comparison they are using ADAM instead of RMSProp and using PongDeterministic-v3 instead of PongDeterministic-v0.

Maybe there is an incredible amount of overhead pushing data to the GPU so only with large models would see a true speedup?

4SkyNet commented 7 years ago

@lolz0r GA3C uses as a baseline original A3C algorithm with some additional issues with RMSProp and loss (see the code). Torch implementation uses A3C-GAE, different neural network architecture and different optimizer. But, yup - the second one arch fits good to the GPU> see one more imp there: https://github.com/4SkyNet/tensorpack

lolz0r commented 7 years ago

@4SkyNet Thanks for the information. Lets find Sarah Connor!

4SkyNet commented 7 years ago

@lolz0r on my way) PS> As I understand, guys from NVIDIA show some way to exploit video card in more efficient way for reinforcement learning benchmark (a3c algorithm). RL isn't efficient on a GPU as some classic ML tasks. You can free to change this algo to Torch one (cuz it's a bit advanced), add lstm or something you want.