devsisters / DQN-tensorflow

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning
MIT License
2.46k stars 765 forks source link

Slower than deep_q_rl #24

Open LinZichuan opened 7 years ago

LinZichuan commented 7 years ago

Hi, I found that this implementation is slower than deep_q_rl which is implemented by theano. Is it because this repo used openai gym rather than rom files? Or the performance between Thesorflow and Theano? Or any other details?

deep_q_rl runs 100-200 steps at learning process. But DQN-tensorflow just runs 70-90 steps at learning process. It makes the training slow, and cannot run 200M in 10 days as dqn nature paper.

ppwwyyxx commented 7 years ago

Looks like it's currently not using GPU efficiently #21

mthrok commented 7 years ago

I ran some experiments a while ago, and observed the same thing.

In my experiments (I did not using this repo), everything other than underlying NN library were same, and mini-batch was fed to GPU in the same manner (send mini-batch every time network is updated.) and yet Theano was faster than Tensorflow.

According to #2919 #3377, Tensorflow's seesion.run method does things more than just feeding data to GPU. thus I guess that is adding overhead and making the training slower than Theano.

ppwwyyxx commented 7 years ago

@mthrok The two issues are saying that feed_dict is slow (not session.run is slow). It's actually a good practice to avoid using feed_dict inside training loops to reduce overhead compared to other frameworks.

Lan1991Xu commented 7 years ago

Has anyone solve the training slower problems? in my case, it almost 600hours training.

quhezheng commented 7 years ago

@ppwwyyxx So nice to see you here, Tensorpack author, why this repo's performance differ so much from your samples in tensorpack. I don't see major difference but this repo collect experince replay in the same thread with training. But does it matter? Or because you used ROM directly?

ppwwyyxx commented 7 years ago

I don't know why. Maybe the use of feed_dict is the major reason. Using thread improves speed but not significant in my case. Using rom should make no difference to speed.

quhezheng commented 7 years ago

@ppwwyyxx I failed to describe the issue clearly. The issue I run into this reop is the training best rewards is 30, no where to compare with the sample in your repo. I compared code but don't see major difference. I changed your sample by replacing ROM direcly with Gym, no code change with network or training, unfortunatly, the output from your code is as bad as this repo, the best reward simply 50 and doesn't make any progress any more after million steps.

So I guess the Gym envirement itself has bugs. But ROM is free of the issue, I though you were aware of this issue so you use ROM instead of Gym

ppwwyyxx commented 7 years ago

It's not a bug. Gym environments (**-v0) is just a harder setting because it has more randomness. You can use other gym settings e.g. BreakoutDeterministic-v4 is closest to a naive atari wrapper. However even with Breakout-v0, you should still be able to see better performance than 50.