About PPO train on the Intersection and DQN train on the highway-v0

AHPUymhd commented 3 months ago

Hello dear authors, thanks for your contributions in highway-env, but I recently had some questions when training the agent with stable-baselines3: 1.I learned 20,000 steps with DQN in highway-v0, but it only learned to steer to the far right, and can't dodge vehicles or even overtake, and the code is the official documentation code as follows: Is there anything wrong with this code? please 2.Even I learned 400,000 steps with PPO at the intersection, but the learning effect was very bad, I don't know what went wrong, can you help me? code as follows:

AHPUymhd commented 3 months ago

@eleurent

eleurent commented 2 months ago

Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

AHPUymhd commented 2 months ago

Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

Thank you very much for your help, I will follow your suggestions to modify the code, thank you very much for your reply to the open source community. By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

eleurent commented 2 months ago

By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

destination is the name of the node in the road network that the ego-vehicle is driving to. They are defined here:

https://github.com/Farama-Foundation/HighwayEnv/blob/dc5cdea1b3fb839c80a6f552c71c8c42ddcfad09/highway_env/envs/intersection_env.py#L152

o1 is the west outer location.

scaling is just the zoom level of the camera.

Farama-Foundation / HighwayEnv

About PPO train on the Intersection and DQN train on the highway-v0 #585