Open ch3njust1n opened 7 years ago
Any luck so far solving the issue?
I am having the same problem with my GTX 1080. Its performance degrades after an hour or so. It starts with 250 it/s with estimated time to finish around 45 hours then drops to 75 it/s with estimated time around 170 hours.
it/s
dropping is normal since the agent learns to survive, so each game tends to take longer. I don't know if it's normal for Titan X to take only 7% load.
Isn't it supposed to finish training in 24~30 hours? It did that using 980ti. However, it does not seem to be the case with Titan X and 1080 even though they outperform it.
Any suggestion about what could be causing such behavior?
@serialx Could you please share with us your setup and the time it took to finish training?
@infin8Recursion No luck so far.
I figured out this issue now and there may be a bug among the recent commits. I'll dig in to this and update this.
Is this issue solved yet ?
In my case,it is also take so long time. The GPU utilisation is about 50%, but the training time need around 500hours to complete 500 000 00 steps, which almost one month~
any update on this?
We don't have an explicit schedule to fix this bug but I recommend you to try other great DQN implementations in TensorFlow like https://github.com/dennybritz/reinforcement-learning or https://github.com/carpedm20/deep-rl-tensorflow
Same problem here. I'm trying to use the repository https://github.com/carpedm20/deep-rl-tensorflow instead.
Same problem for me. On my GTX1070, this repo runs at ~90 iter/sec. https://github.com/carpedm20/deep-rl-tensorflow is faster, at ~120 iter/sec, but by far the fastest implementation (at least for my hardware) is https://github.com/matthiasplappert/keras-rl , running at ~190 iter/sec. If anyone knows faster implementations, feel free to link them here. I'm looking for the fastest possible implementations since I'm doing a load of experiments for 200 million steps, and those 10 iter/sec may result in finishing one experiment half a day sooner.
@ionelhosu Just wanted to point out that it's very hard to compare speed of DQN implementations apple-to-apple. Apart from network and the algorithm (dqn /double dqn, etc), other things can also be different. The most subtle one is "what does each iteration mean". Usually each iteration may include : going forward certain steps in the environment, by either random exploration or using a network, and maybe sample a batch and train on it. The bold parts are all controlled by hyper parameters and is hard to make consistent. Also, due to epsilon-annealing in DQN, the speed is not a constant across training, but gradually going slower as controlled by hyper parameters.
I have a Titan X and have been running the Breakout simulation for over two days now and it's only
7%
through training andnvidia-smi
is showing that it's only using4-5%
. The README.md says that it only took 30 hours on a 980. That doesn't seem right. According tomain.py
, it should be using100%
by default if I don't give the flag. Is anyone else having this issue or is it just me? Edit:nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE
shows thatFB Memory Usage
is11423 MiB/ 12185 Mib
. Does that look correct if using the default GPU setting for Breakout?