Closed seominseok00 closed 10 months ago
If I train using only these rewards, will it guarantee 100% collision avoidance and perfect performance?
I did not measure collision rate as a separate metric, I was just looking at mean reward (which mixes collisions and speed). But no, I think that even the best policy (in terms of mean rewards) did not achieve a 0% colllision rate.
This can probably be tuned / improved a bit by things like changing the reward weights, training for longer or with larger models, changing the observation or the action space, etc.
Additionally, did you set the range for target speeds as [0, 4.5, 9]
Yes, it's configured here: https://github.com/Farama-Foundation/HighwayEnv/blob/8d9324092064ca955df8c3b27a8f1498e14f8624/highway_env/envs/intersection_env.py#L45
And how can i prevent attention from focusing on irrelevant lanes during the training of attention networks?
The attention patterns only emerge from the reward-maximisation objective, it's not really meant to be controlled directly. If you want the attention to adopt a given behaviour, the agent should be rewarded more for doing so.
Got it. Thanks for your response!
In your paper, it's mentioned that a reward of 1 is given when the speed is at the maximum speed, 0 otherwise, and a penalty of -5 is given for collisions. If I train using only these rewards, will it guarantee 100% collision avoidance and perfect performance?
When I try, it doesn't work well when there is a leading vehicle making a right turn.
https://drive.google.com/file/d/1NnqzzsrnCeoIj-0eAq1X4NMpSsKSIbBR/view?usp=sharing
I trained it for 4000 episodes with ego_attention_2h, as mentioned in the paper.
Additionally, did you set the range for target speeds as [0, 4.5, 9]? And how can i prevent attention from focusing on irrelevant lanes during the training of attention networks?