Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch
MIT License
530 stars 164 forks source link

Slower speed than book indicates #23

Open NikEyX opened 5 years ago

NikEyX commented 5 years ago

I'm using PyTorch 1.0.1 with Cuda 10.1 and have 8 cores. I have the same 1080 TI as you have. However, I get only half the speed when running your examples (or less) than what you indicate in the book. Even with all other programs closed.

Do you have any idea what's going on? Any particular reason why the performance could be so much worse? I made sure it's definitely using CUDA in the examples (otherwise it would be 100x slower for some of the problems)

Shmuma commented 5 years ago

Hi!

Book samples were tested (and developed) using pytorch 0.4. Work on the latest pytorch is currently ongoing and is planned for the 2nd edition (somewhere this fall).

I don't think that latest pytorch might be the reason of slowness, but I think it still worth checking.

Other options to get started troubleshooting the issue:

  1. Do generic pytorch benchmark to check for hardware/drivers issues. There are plenty of numbers available, for instance this https://github.com/ryujaehun/pytorch-gpu-benchmark

  2. Checking other package versions, like gym or python-opencv. From my experience, the bottleneck is frequently on cpu postprocessing side.

Shmuma commented 5 years ago

You can try examples from the second edition: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition

This repo has plots from my benchmarks (work is in progress, but anyways). In general I see some minor slowdown (5-10%) of PyTorch 1.1.0 running on CUDA 10.0 in comparison to PyTorch 0.4.1 running CUDA 8.0 on the same 1080Ti. But, in fact, almost no porting was required to make examples running, so, have no idea why you get 1/2 of the speed.

From my personal experience:

NikEyX commented 5 years ago

Ah that's awesome, thanks! I gonna check it out in detail. Lots of good tips.

btw, something tangentially related: Running Seaquest with your simple DQN and 100k frames eventually works out pretty well and reached 5-6k score after a day. Doing the same with your prioritized replay code never takes off. I wonder what the reason for that may be? It looks like prioritized replay doesn't work well at all on this problem - which is in contrast to what their paper argues