LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
746 stars 201 forks source link

self.sumo.trafficlight.setRedYellowGreenState ERROR #117

Closed gioannides closed 1 year ago

gioannides commented 2 years ago

Custom networks spit out this error, how can it be solved?

Code:

import argparse
import os
import sys
import pandas as pd

if 'SUMO_HOME' in os.environ:
    tools = os.path.join(os.environ['SUMO_HOME'], 'tools')
    sys.path.append(tools)
else:
    sys.exit("Please declare the environment variable 'SUMO_HOME'")

from sumo_rl import SumoEnvironment
from sumo_rl.agents import QLAgent
from sumo_rl.exploration import EpsilonGreedy

if __name__ == '__main__':

    alpha = 0.1
    gamma = 0.99
    decay = 1
    runs = 200

    env = SumoEnvironment(net_file='./taxiway.net.xml',
                          route_file='./taxiway.rou.xml',
                          use_gui=True,
                          num_seconds=3600,)

    initial_states = env.reset()
    ql_agents = {ts: QLAgent(starting_state=env.encode(initial_states[ts], ts),
                                 state_space=env.observation_space,
                                 action_space=env.action_space,
                                 alpha=alpha,
                                 gamma=gamma,
                                 exploration_strategy=EpsilonGreedy(initial_epsilon=0.05, min_epsilon=0.005, decay=decay)) for ts in env.ts_ids}
    for run in range(1, runs+1):
        if run != 1:
            initial_states = env.reset()
            for ts in initial_states.keys():
                ql_agents[ts].state = env.encode(initial_states[ts], ts)

        infos = []
        done = {'__all__': False}
        while not done['__all__']:
            actions = {ts: ql_agents[ts].act() for ts in ql_agents.keys()}

            s, r, done, info = env.step(action=actions)

            for agent_id in s.keys():
                ql_agents[agent_id].learn(next_state=env.encode(s[agent_id], agent_id), reward=r[agent_id])

        env.save_csv('outputs/taxiway', run)
        env.close()

Traceback (most recent call last):
  File "multi-agent-ql.py", line 51, in <module>
    s, r, done, info = env.step(action=actions)
  File "/home/anaconda3/envs/sumo_rl_pip/lib/python3.7/site-packages/sumo_rl/environment/env.py", line 231, in step
    self._apply_actions(action)
  File "/home/anaconda3/envs/sumo_rl_pip/lib/python3.7/site-packages/sumo_rl/environment/env.py", line 265, in _apply_actions
    self.traffic_signals[ts].set_next_phase(action)
  File "/home/anaconda3/envs/sumo_rl_pip/lib/python3.7/site-packages/sumo_rl/environment/traffic_signal.py", line 130, in set_next_phase
    self.sumo.trafficlight.setRedYellowGreenState(self.id, self.all_phases[self.yellow_dict[(self.green_phase, new_phase)]].state)
KeyError: (0, 1)
LucasAlegre commented 2 years ago

Hi,

It requires that the traffic signal definitions in the network file are of the form: [green, yellow, red, ..., green, yellow, red]. Can you how the traffic signal is defined in your .net.xml file?

gioannides commented 2 years ago

i generated it from openmapdata

Is there a way to enforce your constraints on it?

LucasAlegre commented 2 years ago

Could you share your .net file so I can try to understand why it is not working?

gioannides commented 2 years ago

Yes - https://drive.google.com/file/d/1D5I75aG1s5wjX1FGgFvAq7IRawJD4BJR/view?usp=sharing

LucasAlegre commented 2 years ago

The problem was not that, the q-learning code you re-used was assuming all agents had the same action space, which is not the case for your network. You only need to pass the correct observation and action space for your agents:

    ql_agents = {ts: QLAgent(starting_state=env.encode(initial_states[ts], ts),
                                 state_space=env.traffic_signals[ts].observation_space,
                                 action_space=env.traffic_signals[ts].action_space,
                                 alpha=alpha,
                                 gamma=gamma,
                                 exploration_strategy=EpsilonGreedy(initial_epsilon=0.05, min_epsilon=0.005, decay=decay)) for ts in env.ts_ids}

However, notice that this probably will not work well with tabular q-learning, as you have a big state and action spaces. I suggest you changing the observation space to be smaller, or using deep RL.

gioannides commented 2 years ago

Great, thanks, can you also tell me how I can pass an "additional.xml" that defines vehicles differently to your framework? It seems it always generates the default passenger type car

LucasAlegre commented 2 years ago

The vehicles are defined in your .rou.xml file. See https://sumo.dlr.de/docs/Definition_of_Vehicles%2C_Vehicle_Types%2C_and_Routes.html