Open Richardxxxxxxx opened 6 years ago
Actually, I'd like to know if and how you were able to visualize these graphics. Did you point your tensorboard logdir to a specific directory? Or are you using the images from the README here?
I am using the README. But if you want to combine multiple graphics into one plot. Tensorboard can do that by simply typing:
tensorboard --logdir name1:/path/to/logs/1,name2:/path/to/logs/2
By the way, the image below is my combing graphics
no_action_repeat(red), 4_action_repeat(blue):
They all seem to have no significant improvement after 4 million iterations.
Do you have any idea?
I remember some time ago I was trying to figure out where exactly is the logdir that I need to point TensorBoard to when using this repo.
Regarding the performance of your model. I really think that after a few million iterations, you can't squeeze much more performance out of DQN. If you look at the more recent papers, such as Rainbow [1], you'll see that many more complex improvements had to be done to DQN in order for it to show any significant better performance. Also, distributed implementations look very promising, such as A3C [2], Distributed Prioritized Experience Replay [3] and the new R2D2 [4].
The action repeat parameter equals 4 because DQN uses stacks of the four most recent frames to compose its state representation. While the new state representation is being composed, the last action it chose is repeated. However, this fact by itself should not warrant such a similar behavior from both your experiments. I'd guess that the frames don't change that quickly in Atari, which theoretically runs at 60 FPS, so selecting a new action every frame is not that crucial.
One parameter that you could explore is the history length (how many frames compose a state for the neural network). In the DRQN paper [5], DQN was tested with a history length of 4 and 10 frames and they got different results. They also tested using a history of 1, but adding an LSTM layer to the network so that it may model the history using its hidden states and they got better results.
class M1(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 1
class M2(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 4
I use python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m2 and python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m1
The "avg_ep_r" in both models reaches 2.1 - 2.3 at around 5 million iterations. But when it comes to even 15 million iterations, the "avg_ep_r" still fluctuates between 2.1 and 2.3.
Just like the result they have shown( I guess that is the result of Action-repeat (frame-skip) of 1, without learning rate decay). I didn't change any parameters.
The strange thing is, even when I use model m2(Action-repeat (frame-skip) of 4), my result is similar to model m1. The "avg_ep_r" fluctuates between 2.1 and 2.3 from around 5 million to 15 million iterations. The max_ep_r fluctuates between 10 and 18 from around 5 million to 15 million iterations.
class M2(DQNConfig): backend = 'tf' env_type = 'detail' action_repeat = 4
Do I need to change some parameters to reach the best result they have shown?
Thank you very much.