LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
746 stars 201 forks source link

Observation space for 4x4 #216

Closed deltag0 closed 1 month ago

deltag0 commented 1 month ago

Maybe I didn't understand the definition of the observation space well: obs = [phase_one_hot, min_green, lane_1_density,...,lane_n_density, lane_1_queue,...,lane_n_queue]

Because for the 4x4 environment, the Iength of the state array was 11 only, but there's 16 lanes (this comes from my understanding of the 2-way intersection, as I understood that roads separated by a dotted line have 2 lanes, but please correct me if I'm wrong). If this is the case, how come the state is only of length 11.

On a different note (I hope it's okay to ask multiple questions in 1 post), I'm also trying to experiment with more intersections, so a single model can adapt to ones not seen in training. I'm still learning a lot, so from what I found, I would have to have a model that processes each lane uniformly. Would I need to change the ObservationFunction for this?

I'm learning a lot, and am really enjoying using sumo-rl for this, thanks a lot for your time!

LucasAlegre commented 1 month ago

The state is computed separately for each intersection, so 11 is the length of the state of one of the intersections only.

I am not sure I understood the second question. Do you mean intersections with different number of lanes or topology? In this case, you would have to define some kind of observation function agnostic to the type of intersection.

I'm glad sumo-rl is being useful!

deltag0 commented 1 month ago

Oh ok, so if the state is separate for each intersection, how do I get the state and choose an action for each intersection? Right now, I'm just doing what I usually do: state, reward, terminated, _, info = env.step(action).

My second question was that I wanted to make a model that could evaluate different environments. So If I used the 2-way environment it would work, and if I used the 4x4 it would work. Basically a model that can evaluate environments it hasn't seen in training.