The car cannot perform like the GIF after training converged

eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms

MIT License

578 stars 152 forks source link

The car cannot perform like the GIF after training converged #21

Closed LevineYang closed 5 years ago

LevineYang commented 5 years ago

Hi Edouard Leurent, it is a great project for Rlers. But now I encountered a problem, when I do "python scripts/experiments.py evaluate scripts/configs/HighwayEnv/env.json scripts/configs/HighwayEnv/agents/DQNAgent/baseline.json --train --episodes=2000", the score slowly converges to about 30 like:

"[INFO] Episode 595 score: 30.1 [INFO] Episode 596 score: 29.8 [INFO] Episode 597 score: 30.7 [INFO] Episode 598 score: 29.1 [INFO] Episode 599 score: 30.3 [INFO] Episode 600 score: 29.9 [INFO] Episode 601 score: 30.7 [INFO] Episode 602 score: 30.5 [INFO] Episode 603 score: 30.0 [INFO] Episode 604 score: 29.0"

But everytime when the video begin to record the vehicle running, it run into another car or cannot accelerate to overtake. So is this baseline.json for the GIF you add in highway-env repo? or I misunderstand something?

I appreciated if you can give me any suggestion. Thank you!

eleurent commented 5 years ago

Hi @LevineYang , sorry I've been quite busy lately. I think you've understood everything properly. My first remark is that this gif may be misleading: the DQN I trained at that time (long ago) converged to a policy with a high variance in expected return. I showed a sucessful episode, but it actually crashed in about 20% of the episodes. It was still more successful than what you describe though. I've run the code again today and was able to reproduce your results. I changed a few hyperparameters and could obtain similar behaviour as I used to, so I guess this was mainly due to bad refactoring of the configuration files.

eleurent commented 5 years ago

Here's an example of successful episode I obtained with current version example_episode.zip

eleurent commented 5 years ago

And here is the corresponding tensorboard graph of the policy return over 2000 episodes return

LevineYang commented 5 years ago

Thank you, Edouard! The result can be reproduced by following your suggestion!

skynox03 commented 3 years ago

Hi Eleurent,

Even I was trying to replicate the gif, and found this issue. However, I am unable to find "baseline.json" configuration for the DQN agent , which has been mentioned in these issues. Maybe the repository has evolved. Can you maybe tell me, which configuration corresponds to the baseline.json now or the example video?

eleurent commented 3 years ago

Hi @skynox03 , The repository has indeed evolved, and I think that the previous baseline.json is now dqn.json. But you will probably get better results with dueling_ddqn.json or even ego_attention.json.