Indiscriminate Yellow Steps

Pi-Star-Lab / RESCO

Reinforcement Learning Benchmarks for Traffic Signal Control (RESCO)

121 stars 37 forks source link

Indiscriminate Yellow Steps #12

Closed mschrader15 closed 10 months ago

mschrader15 commented 2 years ago

I am walking through the implementation and it seems that the MultiSignal.step enforces a peculiarity where if the actor resolution is < than the yellow time of signals, then all actors are unable to make observations & act during the period when one actor is in transition mode.

Could we move the yellow state internal to the traffic signal itself and simply prevent state changes until after the transition to green is made? Would also be nice to enforce some minimum green time.

Another option is to let SUMO handle the transition timing by using detector overrides, which is a functionality that they added! This would be a big change to the project structure, but ultimately more realistic in traffic signal internal behavior

https://github.com/Pi-Star-Lab/RESCO/blob/dc773abca1885d961a1aada800d67ac74781b1ba/resco_benchmark/multi_signal.py#L181-L182

jault commented 2 years ago

Ideally the interface could be adjusted to support PettingZoo's. Right now all agents need to choose actions at the same time, as this was the assumption of the TSC focused MARL algorithms. Minimum green is enforced currently by having the action interval large enough to encompass both a minimum green and yellow duration period.

Getting to the PettingZoo interface would need yellow times pushed into the lights as you say, as well as other changes. RESCO still needs to support the assumption all agents work in the same interval too.

I'm not sure about the detector overrides - is there some documentation available?

mschrader15 commented 2 years ago

Okay, perhaps I'm trying to take this project in a direction that it's not intended... We deal with control algorithms that manipulate NEMA or dual ring controllers, meaning the control algos are constrained to the timing and phase transitions that DOTs allow. One example of the increased complexity is that signals have different yellow and min green times depending on the road speed limit.

TMI, but we want to baseline these "more realistic" controllers against the RL & maxpressure methods in this project. I'm leaning towards a separate implementation, where we only rely on SUMO files in this project, and the actor classes are implemented in a completely different fashion/repo. It's probably not worth manipulating the source for us, instead just taking the results for comparison.

Detector Overrides implementation discussion is here and also here

jault commented 2 years ago

Actually supporting that is desired - that's one of the primary differences between the PettingZoo and Gym multiagent interfaces. Most MA algorithms don't allow for agents with heterogenous action intervals, an interesting aspect to study considering many real problems don't have synchronized activity - as you point out is the case in TSC.

I just meant to say supporting the Gym interface is still needed, I think that could be accomplished by simply overriding the individual signals to have the same green/yellow timings when needed. The default behavior should be that all signals operate according to the safety requirements they use in the real-world. Future MARL algorithms must account for this reality.

Thanks for the links I'll check them out.

mschrader15 commented 2 years ago

Would you be interested in hoping on a Zoom call sometime in near future to discuss? I'm guessing some of our research is aligned and maybe we can benefit from more direct collab!?

jault commented 2 years ago

Sure, I'll send you an email to work out a time.