why does the reward of training varies a great deal from the testing process?

ShanHaoYu / Deep-Q-Network-Breakout

This is an implementation of Deep Q Learning (DQN) playing Breakout from OpenAI's gym with Keras.

31 stars 14 forks source link

why does the reward of training varies a great deal from the testing process? #4

Open Qiyangcao opened 5 years ago

Qiyangcao commented 5 years ago

The agent should follow the same policy in both training and testing process , and I check the code, and run all three methods of deep Q-learning, but the training reward of an episode are always below 15, but the testing reward can be above 60, I can't really explain that

MaddyThakker commented 4 years ago

Exploration/Exploitation in training, only Exploitation in testing.