eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
582 stars 152 forks source link

What reward function did you use in IntersectionEnv? #108

Closed seominseok00 closed 10 months ago

seominseok00 commented 11 months ago

image

In your paper, it's mentioned that a reward of 1 is given when the speed is at the maximum speed, 0 otherwise, and a penalty of -5 is given for collisions. If I train using only these rewards, will it guarantee 100% collision avoidance and perfect performance?

When I try, it doesn't work well when there is a leading vehicle making a right turn.

https://drive.google.com/file/d/1NnqzzsrnCeoIj-0eAq1X4NMpSsKSIbBR/view?usp=sharing

I trained it for 4000 episodes with ego_attention_2h, as mentioned in the paper.

Additionally, did you set the range for target speeds as [0, 4.5, 9]? And how can i prevent attention from focusing on irrelevant lanes during the training of attention networks?

image

eleurent commented 10 months ago

If I train using only these rewards, will it guarantee 100% collision avoidance and perfect performance?

I did not measure collision rate as a separate metric, I was just looking at mean reward (which mixes collisions and speed). But no, I think that even the best policy (in terms of mean rewards) did not achieve a 0% colllision rate.

This can probably be tuned / improved a bit by things like changing the reward weights, training for longer or with larger models, changing the observation or the action space, etc.

Additionally, did you set the range for target speeds as [0, 4.5, 9]

Yes, it's configured here: https://github.com/Farama-Foundation/HighwayEnv/blob/8d9324092064ca955df8c3b27a8f1498e14f8624/highway_env/envs/intersection_env.py#L45

And how can i prevent attention from focusing on irrelevant lanes during the training of attention networks?

The attention patterns only emerge from the reward-maximisation objective, it's not really meant to be controlled directly. If you want the attention to adopt a given behaviour, the agent should be rewarded more for doing so.

seominseok00 commented 10 months ago

Got it. Thanks for your response!