Open NikEyX opened 5 years ago
Hi!
Book samples were tested (and developed) using pytorch 0.4. Work on the latest pytorch is currently ongoing and is planned for the 2nd edition (somewhere this fall).
I don't think that latest pytorch might be the reason of slowness, but I think it still worth checking.
Other options to get started troubleshooting the issue:
Do generic pytorch benchmark to check for hardware/drivers issues. There are plenty of numbers available, for instance this https://github.com/ryujaehun/pytorch-gpu-benchmark
Checking other package versions, like gym or python-opencv. From my experience, the bottleneck is frequently on cpu postprocessing side.
You can try examples from the second edition: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On-Second-Edition
This repo has plots from my benchmarks (work is in progress, but anyways). In general I see some minor slowdown (5-10%) of PyTorch 1.1.0 running on CUDA 10.0 in comparison to PyTorch 0.4.1 running CUDA 8.0 on the same 1080Ti. But, in fact, almost no porting was required to make examples running, so, have no idea why you get 1/2 of the speed.
From my personal experience:
watch -n1 -d
, it harms performance (might be some locks in drivers)Ah that's awesome, thanks! I gonna check it out in detail. Lots of good tips.
btw, something tangentially related: Running Seaquest with your simple DQN and 100k frames eventually works out pretty well and reached 5-6k score after a day. Doing the same with your prioritized replay code never takes off. I wonder what the reason for that may be? It looks like prioritized replay doesn't work well at all on this problem - which is in contrast to what their paper argues
I'm using PyTorch 1.0.1 with Cuda 10.1 and have 8 cores. I have the same 1080 TI as you have. However, I get only half the speed when running your examples (or less) than what you indicate in the book. Even with all other programs closed.
Do you have any idea what's going on? Any particular reason why the performance could be so much worse? I made sure it's definitely using CUDA in the examples (otherwise it would be 100x slower for some of the problems)