Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving
https://highway-env.farama.org/
MIT License
2.48k stars 726 forks source link

About PPO train on the Intersection and DQN train on the highway-v0 #585

Open AHPUymhd opened 3 months ago

AHPUymhd commented 3 months ago

Hello dear authors, thanks for your contributions in highway-env, but I recently had some questions when training the agent with stable-baselines3: 1.I learned 20,000 steps with DQN in highway-v0, but it only learned to steer to the far right, and can't dodge vehicles or even overtake, and the code is the official documentation code as follows: image image image Is there anything wrong with this code? please 2.Even I learned 400,000 steps with PPO at the intersection, but the learning effect was very bad, I don't know what went wrong, can you help me? code as follows: image image

AHPUymhd commented 3 months ago

@eleurent

eleurent commented 2 months ago

Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

AHPUymhd commented 2 months ago

Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

Thank you very much for your help, I will follow your suggestions to modify the code, thank you very much for your reply to the open source community. By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

eleurent commented 2 months ago

By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

https://github.com/Farama-Foundation/HighwayEnv/blob/dc5cdea1b3fb839c80a6f552c71c8c42ddcfad09/highway_env/envs/intersection_env.py#L152

o1 is the west outer location.