The agent should follow the same policy in both training and testing process , and I check the code, and run all three methods of deep Q-learning, but the training reward of an episode are always below 15, but the testing reward can be above 60, I can't really explain that
The agent should follow the same policy in both training and testing process , and I check the code, and run all three methods of deep Q-learning, but the training reward of an episode are always below 15, but the testing reward can be above 60, I can't really explain that