Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving
https://highway-env.farama.org/
MIT License
2.61k stars 747 forks source link

About sb3_highway_dqn #532

Open HodaHo opened 11 months ago

HodaHo commented 11 months ago

Dear author; thank you for your great work. I have 2 question about sb3_highway_dqn example, please guide me: 1- How can I extract graphs related to reinforcement learning? (I mean the 3 presented graphs of length, reward and exploration rate vs episodes in the example). What is command code in Python?

2- I plan to train the ego vehicle to do lane change on the highway. Should I bring this goal in the reward or give it like the example of the intersection of the destination according to the index lane? I know how to index lanes, but I don't understand exactly what changes to make this change from start lane to final(target) lane.

Best regards

eleurent commented 11 months ago
  1. not sure what example you are referring to, but this depends on the reinforcement learning library you are using. For example, with stable baselines you can look at the tensorboard.

  2. yes if you want the agent to follow some behaviour you need to reward it for it (i.e. moving to the correct lane) so the reward function has to account for this. And the agent also need to be able to predict the reward from its observation, so make sure the observation contains sufficient informations (e.g. the position of the vehicle, and the position of the lane if its not always the same)

HodaHo commented 10 months ago

Thank you for your reply,

  1. I'd like to plot length, reward, and exploration rate vs episodes using rl-agents library, please guide me for Python code.

  2. Why is it not considered in the intersection problem, (there is no reward for "o" target)?

eleurent commented 10 months ago
  1. as of now you can visualise training progress (length, reward) on tensorboard. If you want to log additional information, you will have to edit evaluation.py to do so (after the call to env.step, for instance), and then you can make your own plots based on the logged data.
  2. sorry, I don't think I understand your question, can you elaborate?
HodaHo commented 10 months ago

(For question2) In the continuation of the answer to question 2 above, you explained that in order to be placed in the desired lane, we must include it in the reward function, but why, in the example of the intersection, moving towards the destination o1 is not included in the reward function?

eleurent commented 9 months ago

It is included through the speed reward: having a higher speed means we make more progress towards the destination (by default, the lateral control is done automatically, so we are always following the desired path, the agent just controls the speed to avoid collisions)