LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
731 stars 197 forks source link

Continuous action control #201

Closed Aegis1863 closed 5 months ago

Aegis1863 commented 6 months ago

Discrete actions perform very well on PPO, but the on policy algorithm still suffers from the problem of low sampling efficiency, and it is difficult to find a off-policy method suitable for discrete actions (only DQN and the effect is not good).

Is there any plan to give a continuous action type control scheme, such as controlling the maintenance time of a certain phase, and the environment will perform the action according to that time until the end, at this time, the next state is given and the agent makes another decision.

LucasAlegre commented 6 months ago

The continuous action scheme you proposed would be a bit problematic since the agent would receive rewards on an irregular time schedule depending on the action selected, while we would like the agent to receive rewards every N seconds so we can measure the effect of the actions.

While I agree that vanilla DQN usually does not perform well, there are many good DQN variants that make it more stable. I would try Double DQN, Rainbow, MaxMin DQN, etc.