Closed farazfarid closed 1 week ago
Hi, I just ran the sb3_highway_dqn.py script at head (20k steps), and here is what I get: logs
----------------------------------
| rollout/ | |
| ep_len_mean | 9.75 |
| ep_rew_mean | 6.78 |
| exploration_rate | 0.981 |
| time/ | |
| episodes | 4 |
| fps | 13 |
| time_elapsed | 2 |
| total_timesteps | 39 |
----------------------------------
...
----------------------------------
| rollout/ | |
| ep_len_mean | 25.2 |
| ep_rew_mean | 19.7 |
| exploration_rate | 0.05 |
| time/ | |
| episodes | 1056 |
| fps | 12 |
| time_elapsed | 1566 |
| total_timesteps | 19894 |
| train/ | |
| learning_rate | 0.0005 |
| loss | 0.126 |
| n_updates | 19693 |
----------------------------------
so mean reward improved from 6 to 19, and episode length from 9 to 25.
Here are 10 (non-cherry-picked) test episodes
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/258e214e-8d2b-440a-b380-79aaa54db762
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/14ad0a05-c5ff-4d18-9f56-d894d77ec2f1
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/fb0119dc-13fb-403b-b4d1-6f0e9c7f87b1
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/35b7dff7-e03a-42c1-bc9e-91d196f9dc4b
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/4ac6063f-2ed4-4c08-be55-9aaa81d348de
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/19e838c0-4bc8-454c-bac0-090b7a305fd5
https://github.com/Farama-Foundation/HighwayEnv/assets/1706935/cc83377d-b223-4f00-a5f1-16b9f8d7c204
While they are not perfect by all means, I think they show some situational awareness, at least the vehicle doesnt just crash into the first vehicle on the highway like in your case, so I'm not sure what is going on. If you have similar metrics (reward, episode length) while training, maybe you are not loading the checkpoint correctly at test time?
Edit :maybe there was a slight regression compared to the Getting Started version: all 5 runs there get roughly 25 mean reward, while my last training only reached 20... and the behaviours qualitatively look a bit more conservative than the Getting Started video. It's probably worth running a few more experiemnts to check if this regression is reproducible.
How to I get the agent to drive like on the Getting Started page, I used the same code and trained for around 100k steps. It showed an ep_len_mean and ep_rew_mean of around 25-30 but as I let it run it mostly crashed into the first vehicle on the highway.