Slower speed than book indicates

Shmuma / ptan

PyTorch Agent Net: reinforcement learning toolkit for pytorch

MIT License

530 stars 164 forks source link

Slower speed than book indicates #23

Open NikEyX opened 5 years ago

NikEyX commented 5 years ago

I'm using PyTorch 1.0.1 with Cuda 10.1 and have 8 cores. I have the same 1080 TI as you have. However, I get only half the speed when running your examples (or less) than what you indicate in the book. Even with all other programs closed.

Do you have any idea what's going on? Any particular reason why the performance could be so much worse? I made sure it's definitely using CUDA in the examples (otherwise it would be 100x slower for some of the problems)

Shmuma commented 5 years ago

Hi!

Book samples were tested (and developed) using pytorch 0.4. Work on the latest pytorch is currently ongoing and is planned for the 2nd edition (somewhere this fall).

I don't think that latest pytorch might be the reason of slowness, but I think it still worth checking.

Other options to get started troubleshooting the issue:

Do generic pytorch benchmark to check for hardware/drivers issues. There are plenty of numbers available, for instance this https://github.com/ryujaehun/pytorch-gpu-benchmark
Checking other package versions, like gym or python-opencv. From my experience, the bottleneck is frequently on cpu postprocessing side.

Shmuma commented 5 years ago

You can try examples from the second edition: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition

This repo has plots from my benchmarks (work is in progress, but anyways). In general I see some minor slowdown (5-10%) of PyTorch 1.1.0 running on CUDA 10.0 in comparison to PyTorch 0.4.1 running CUDA 8.0 on the same 1080Ti. But, in fact, almost no porting was required to make examples running, so, have no idea why you get 1/2 of the speed.

From my personal experience:

nvidia-smi can cost you 10-15% of performance running in background. I have a habit to run nvidia-smi using watch -n1 -d, it harms performance (might be some locks in drivers)
running something cpu-heavy, might kill the performance significantly, as Atari relies on both GPU and CPU
running program over slow connection also might slow down (sometimes I connect to my server over GPRS net, it might show really weird performance numbers)

NikEyX commented 5 years ago

Ah that's awesome, thanks! I gonna check it out in detail. Lots of good tips.

btw, something tangentially related: Running Seaquest with your simple DQN and 100k frames eventually works out pretty well and reached 5-6k score after a day. Doing the same with your prioritized replay code never takes off. I wonder what the reason for that may be? It looks like prioritized replay doesn't work well at all on this problem - which is in contrast to what their paper argues