Closed woaipichuli closed 5 years ago
This situation can be brought by many factors as RL is sort of unstable. Changing random seed may work if you find that the training script does not perform well. You also have an alternative option to turn the parameters according to the log of rewards. Typically speaking, the higher the reward is, the more things will the RL agent learn.
I have run the code (train battle) and watch the video. However, I found the agents have learned nothing. Why this happens?Do I need to prolong the training process?