Closed smarianimore closed 1 year ago
Hi!
Dear @LucasAlegre , everybody, I'm having a hard time understanding exactly how the library works regarding a few parameters, to debug why I'm getting bad performances with Q-learning.
In particular (I'll add more context later if needed, bear with me):
- what exactly is
delta_time
? I understood it is the time interval at which learning traffic lights (TL henceforth) have the chance to do an action (i.e. change the green phase configuration of traffic lights), that is, everydelta_time
agents act in the environment
That is correct.
- what exactly is
min_green
? I understood it is a both a boolean and discrete variable (?) stating whether a minimum of seconds have passed since the start of the last green phase configuration acted. How it is used in learning, though?
The agent is not allowed to change phase if the current phase is active for less than min_green seconds. If it tries to change the phase, the action will have no effect. Because this information is important to the agent, in the agent observations there is a binary variable that indicates whether min_green seconds have elapsed or not.
- what exactly is
Phase.duration
? I found that this param is hard-coded as 60 (seconds?), and no way to change it as a configuration. Does it mean the green phase always lasts 60 seconds? What is its relationship with parammin_green
and the following 3 other params below?
This is done because Traci requires a value to be passed when creating a phase. This 60 seconds value is not used anywhere, and has not impact.
- what exactly are
Phase.min_duration
andPhase.max_duration
? Can the learning process also learn how the green phase should last between these two extremes? What is their relationships with parammin_green
?
These variables have no influence. The only constraint is that the phase will last more than min_green seconds. The agent can learn to active phase for min_green seconds, or min_green + delta_time, or min_green + 2delta_time, min_green + ndelta_time, ...
Briefly, the context is that I want to compare Q-learning with fixed phases (as default in SUMO, for instance), but I'm having a hard time debugging RL and understanding how exactly green phases work...
As a last bit of info, also in SUMO .xml config file there is a param to set the duration of the green phase...how does it fit the picture above?
SUMO-RL does not use the XML config file, so this is ignored.
many thanks for your kind and thorough explanation
Dear @LucasAlegre , everybody, I'm having a hard time understanding exactly how the library works regarding a few parameters, to debug why I'm getting bad performances with Q-learning.
In particular (I'll add more context later if needed, bear with me):
delta_time
? I understood it is the time interval at which learning traffic lights (TL henceforth) have the chance to do an action (i.e. change the green phase configuration of traffic lights), that is, everydelta_time
agents act in the environmentmin_green
? I understood it is a both a boolean and discrete variable (?) stating whether a minimum of seconds have passed since the start of the last green phase configuration acted. How it is used in learning, though?Phase.duration
? I found that this param is hard-coded as 60 (seconds?), and no way to change it as a configuration. Does it mean the green phase always lasts 60 seconds? What is its relationship with parammin_green
and the following 3 other params below?Phase.min_duration
andPhase.max_duration
? Can the learning process also learn how green phase should last between these two extremes? What is their relationships with parammin_green
?Briefly, the context is that I want to compare Q-learning with fixed phases (as default in SUMO, for instance), but I'm having a hard time debugging RL and understanding how exactly green phases work...
As a last bit of info, also in SUMO .xml config file there is a param to set the duration of the green phase...how does it fit the picture above?