Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving
https://highway-env.farama.org/
MIT License
2.64k stars 752 forks source link

Training results are not the same after 100k steps #460

Closed farazfarid closed 1 week ago

farazfarid commented 1 year ago

How to I get the agent to drive like on the Getting Started page, I used the same code and trained for around 100k steps. It showed an ep_len_mean and ep_rew_mean of around 25-30 but as I let it run it mostly crashed into the first vehicle on the highway.

image

eleurent commented 1 year ago

Hi, I just ran the sb3_highway_dqn.py script at head (20k steps), and here is what I get: logs

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 9.75     |
|    ep_rew_mean      | 6.78     |
|    exploration_rate | 0.981    |
| time/               |          |
|    episodes         | 4        |
|    fps              | 13       |
|    time_elapsed     | 2        |
|    total_timesteps  | 39       |
----------------------------------

...

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 25.2     |
|    ep_rew_mean      | 19.7     |
|    exploration_rate | 0.05     |
| time/               |          |
|    episodes         | 1056     |
|    fps              | 12       |
|    time_elapsed     | 1566     |
|    total_timesteps  | 19894    |
| train/              |          |
|    learning_rate    | 0.0005   |
|    loss             | 0.126    |
|    n_updates        | 19693    |
----------------------------------

so mean reward improved from 6 to 19, and episode length from 9 to 25.

Here are 10 (non-cherry-picked) test episodes

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/258e214e-8d2b-440a-b380-79aaa54db762

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/14ad0a05-c5ff-4d18-9f56-d894d77ec2f1

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/fb0119dc-13fb-403b-b4d1-6f0e9c7f87b1

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/35b7dff7-e03a-42c1-bc9e-91d196f9dc4b

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/4ac6063f-2ed4-4c08-be55-9aaa81d348de

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/19e838c0-4bc8-454c-bac0-090b7a305fd5

https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/cc83377d-b223-4f00-a5f1-16b9f8d7c204

While they are not perfect by all means, I think they show some situational awareness, at least the vehicle doesnt just crash into the first vehicle on the highway like in your case, so I'm not sure what is going on. If you have similar metrics (reward, episode length) while training, maybe you are not loading the checkpoint correctly at test time?

Edit :maybe there was a slight regression compared to the Getting Started version: all 5 runs there get roughly 25 mean reward, while my last training only reached 20... and the behaviours qualitatively look a bit more conservative than the Getting Started video. It's probably worth running a few more experiemnts to check if this regression is reproducible.