carpedm20 / deep-rl-tensorflow

TensorFlow implementation of Deep Reinforcement Learning papers
MIT License
1.6k stars 396 forks source link

About tqdm and its constantly decreasing iterative speed #31

Closed fredchenjialin closed 7 years ago

fredchenjialin commented 7 years ago

hello, I'm very confused about the speed of iteration. Within a few minutes after the program runs, the value of it/s was pretty big and it runs very fast that's what I'd love to see. But, in the process of running, the value of it/s is constantly decreasing. After about 30 minutes, the value will drop from about 10000 to 900, and it's going down.

Is this the problem of setting up the GPU or tqdm?

The graphics card I used is two Nvidia K40.

image The picture below shows 30 minutes later image

for nvidia-smi image

carpedm20 commented 7 years ago

Hmm I still don't know what's the reason for that but one thing that I found from your screenshot is that, you are using all K40 for a single experiments. The pre-defined models in this repo is pretty small (200-400 mb) so you can fit multiple (more than 10) experiments in a single K40. I pushed a6a836edd8cc9cba41e84eaa54dce20c3b24a5b2 to add allow_soft_placement where default value is True.

carpedm20 commented 7 years ago

Did you tried training without tqdm? I know tqdm sometimes cause problem in multi-threading setting but this code does not use such fancy techniques.. so I assume it's not a problem of tqdm.

ppwwyyxx commented 7 years ago

In a typical DQN, at the beginning the training runs with random exploration, but later it will gradually start to use the network for exploration. So some sort of slow-down is normal. (although 10x is definitely too much)

carpedm20 commented 7 years ago

@ppwwyyxx I thought about it but the step in both images is quite large. I also found that the log avg_r: 0.001234 ... is only printed when the random exploration is finished which means they are both in the training phase.

edit: Never mind avg_r looks always printed.

ppwwyyxx commented 7 years ago

In training phase you're decreasing epsilon I guess? This also gradually slows things down.

carpedm20 commented 7 years ago

Oh.. yes. That's right. That explains the slow down. So the frequency of predicting the action rather than choosing random action increases as the step increases in DQN.

fredchenjialin commented 7 years ago

Thank you。@carpedm20 @ppwwyyxx About "you can fit multiple (more than 10) experiments in a single K40", how to do this? or How can I use my GPU more efficiently? Can I increase the batch_size or memory_size in main.py?

flags.DEFINE_integer('batch_size', 32, 'The size of batch for minibatch training') flags.DEFINE_integer('memory_size', 100, 'The size of experience memory (*= scale)')

carpedm20 commented 7 years ago

What you need to do is just pull the code and run it. See changs of a6a836edd8cc9cba41e84eaa54dce20c3b24a5b2 for detials.

fredchenjialin commented 7 years ago

ok, thank you. @carpedm20

ppwwyyxx commented 7 years ago

Isn't this still considered a bug? Even if it's always using the network to predict, the extra work is just one forward time every step. The training happens every 4 steps, so it's 32 forward/backward every 4 steps. The prediction shouldn't make training 10x slower..