LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
743 stars 201 forks source link

More phases generated by _build_phases() when running ql_2way-single-intersection.py #184

Closed Gavin-Tao closed 9 months ago

Gavin-Tao commented 9 months ago

Thanks very much for your open source, which provides great help for us. Thanks a lot.

When I run the ql_2way-single-intersection.py in 2way-single-intersection sumo network, the initial phases are as below:

(Phase(duration=33.0, state='GGrrrrGGrrrr', minDur=33.0, maxDur=33.0), Phase(duration=2.0, state='yyrrrryyrrrr', minDur=2.0, maxDur=2.0), Phase(duration=6.0, state='rrGrrrrrGrrr', minDur=6.0, maxDur=6.0), Phase(duration=2.0, state='rryrrrrryrrr', minDur=2.0, maxDur=2.0), Phase(duration=33.0, state='rrrGGrrrrGGr', minDur=33.0, maxDur=33.0), Phase(duration=2.0, state='rrryyrrrryyr', minDur=2.0, maxDur=2.0), Phase(duration=6.0, state='rrrrrGrrrrrG', minDur=6.0, maxDur=6.0), Phase(duration=2.0, state='rrrrryrrrrry', minDur=2.0, maxDur=2.0)) image

The number of initial phases is 8.

When I debug the code until _build_phases(), I found that the phases will be 16 numbers, as below: [Phase(duration=33.0, state='GGrrrrGGrrrr', minDur=-1, maxDur=-1), Phase(duration=6.0, state='rrGrrrrrGrrr', minDur=-1, maxDur=-1), Phase(duration=33.0, state='rrrGGrrrrGGr', minDur=-1, maxDur=-1), Phase(duration=6.0, state='rrrrrGrrrrrG', minDur=-1, maxDur=-1), Phase(duration=2, state='yyrrrryyrrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='yyrrrryyrrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='yyrrrryyrrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='rryrrrrryrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='rryrrrrryrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='rryrrrrryrrr', minDur=-1, maxDur=-1), Phase(duration=2, state='rrryyrrrryyr', minDur=-1, maxDur=-1), Phase(duration=2, state='rrryyrrrryyr', minDur=-1, maxDur=-1), Phase(duration=2, state='rrryyrrrryyr', minDur=-1, maxDur=-1), Phase(duration=2, state='rrrrryrrrrry', minDur=-1, maxDur=-1), Phase(duration=2, state='rrrrryrrrrry', minDur=-1, maxDur=-1), Phase(duration=2, state='rrrrryrrrrry', minDur=-1, maxDur=-1)]

May I ask is correct? If yes, may I ask why the code needs to increase another 8 phases? I am a little confused here and really appreciate it if you could provide more details about this function.

LucasAlegre commented 9 months ago

Hi @Gavin-Tao ,

The defined traffic light in the .net file has 4 green phases (i.e., a phase with a 'g' or 'G'). Hence, the agent has 4 possible actions, as you can see looking at "env.action_space".

The reason there are 16 phases is because, for each green phase i, we need a yellow phase transitioning from phase i to phase j. So, if agent is on green phase 0 and will transition to green phase 3, it will first activate the yellow phase that transitions from green phase 0 and 3.

Gavin-Tao commented 9 months ago

Please be aware that this email box is not available now. If you have any contact with me, please send your emails to my current email box @.***).

Gavin-Tao commented 9 months ago

Hi @LucasAlegre ,

Thanks very much for your prompt reply and useful information.

Have a nice day!

Best wishes, Gavin