Open AHPUymhd opened 3 months ago
@eleurent
Hi,
For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True
) instead of relative (absolute: False
). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.
So I would set the observation config to relative (absolute: False) and try again.
For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.
Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (
absolute: True
) instead of relative (absolute: False
). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.So I would set the observation config to relative (absolute: False) and try again.
For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.
Thank you very much for your help, I will follow your suggestions to modify the code, thank you very much for your reply to the open source community. By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.
By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.
destination
is the name of the node in the road network that the ego-vehicle is driving to. They are defined here:o1 is the west outer location.
scaling
is just the zoom level of the camera.
Hello dear authors, thanks for your contributions in highway-env, but I recently had some questions when training the agent with stable-baselines3: 1.I learned 20,000 steps with DQN in highway-v0, but it only learned to steer to the far right, and can't dodge vehicles or even overtake, and the code is the official documentation code as follows:
Is there anything wrong with this code? please
2.Even I learned 400,000 steps with PPO at the intersection, but the learning effect was very bad, I don't know what went wrong, can you help me? code as follows:
![image](https://github.com/Farama-Foundation/HighwayEnv/assets/72792297/752ffc4b-e361-4b22-84b3-c814663466f8)