LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
746 stars 201 forks source link

Clarification on TLS phases #135

Closed smarianimore closed 1 year ago

smarianimore commented 1 year ago

Dear @LucasAlegre , everybody, I'm having a hard time understanding exactly how the library works regarding a few parameters, to debug why I'm getting bad performances with Q-learning.

In particular (I'll add more context later if needed, bear with me):

Briefly, the context is that I want to compare Q-learning with fixed phases (as default in SUMO, for instance), but I'm having a hard time debugging RL and understanding how exactly green phases work...

As a last bit of info, also in SUMO .xml config file there is a param to set the duration of the green phase...how does it fit the picture above?

LucasAlegre commented 1 year ago

Hi!

Dear @LucasAlegre , everybody, I'm having a hard time understanding exactly how the library works regarding a few parameters, to debug why I'm getting bad performances with Q-learning.

In particular (I'll add more context later if needed, bear with me):

  • what exactly is delta_time? I understood it is the time interval at which learning traffic lights (TL henceforth) have the chance to do an action (i.e. change the green phase configuration of traffic lights), that is, every delta_time agents act in the environment

That is correct.

  • what exactly is min_green? I understood it is a both a boolean and discrete variable (?) stating whether a minimum of seconds have passed since the start of the last green phase configuration acted. How it is used in learning, though?

The agent is not allowed to change phase if the current phase is active for less than min_green seconds. If it tries to change the phase, the action will have no effect. Because this information is important to the agent, in the agent observations there is a binary variable that indicates whether min_green seconds have elapsed or not.

  • what exactly is Phase.duration? I found that this param is hard-coded as 60 (seconds?), and no way to change it as a configuration. Does it mean the green phase always lasts 60 seconds? What is its relationship with param min_green and the following 3 other params below?

This is done because Traci requires a value to be passed when creating a phase. This 60 seconds value is not used anywhere, and has not impact.

  • what exactly are Phase.min_duration and Phase.max_duration? Can the learning process also learn how the green phase should last between these two extremes? What is their relationships with param min_green?

These variables have no influence. The only constraint is that the phase will last more than min_green seconds. The agent can learn to active phase for min_green seconds, or min_green + delta_time, or min_green + 2delta_time, min_green + ndelta_time, ...

Briefly, the context is that I want to compare Q-learning with fixed phases (as default in SUMO, for instance), but I'm having a hard time debugging RL and understanding how exactly green phases work...

As a last bit of info, also in SUMO .xml config file there is a param to set the duration of the green phase...how does it fit the picture above?

SUMO-RL does not use the XML config file, so this is ignored.

smarianimore commented 1 year ago

many thanks for your kind and thorough explanation