LucasAlegre / sumo-rl

Reinforcement Learning environments for Traffic Signal Control with SUMO. Compatible with Gymnasium, PettingZoo, and popular RL libraries.
https://lucasalegre.github.io/sumo-rl
MIT License
706 stars 193 forks source link

Weirdly bad results with the example scripts #86

Closed michalgregor closed 1 year ago

michalgregor commented 2 years ago

Hi, I've been running some of the experiment scripts provided with the package and I am getting weirdly bad results – mostly there is just no visible improvement at all. E.g. when running:

python experiments/dqn_2way-single-intersection.py

and plotting results using

python outputs/plot.py -f outputs/2way-single-intersection/dqn

I get the following results:

image

Are you getting the same results? Is this the expected behaviour with this experiment? Is it just that the hyperparameters in the script are really bad, the scenario is challenging, or is it that something in my setup is not working correctly?

As a suggestion – maybe we should run all the included experiments scripts and store the results somewhere for reference. That would enable users to make sure that everything works as intended before they start running their own experiments.

LucasAlegre commented 2 years ago

Hi,

I just updated the experiments in the last commit. There was a parameter max_depart_delay=0 which was causing the vehicles to not be inserted when the lanes were full. I removed this now, and I got the following results for the first and second episodes: Figure_1

and the following results when comparing the second and third episodes: Figure_1 I also checked in the GUI, and these results for the second and third episodes are very good (very few cars waiting).

Thank you for the suggestion! Indeed the experiments and environments (except the RESCO environments which have results reported in the paper) need better documentation.

michalgregor commented 2 years ago

Thank you for looking into this so quickly!

I wonder, would you be able to perhaps provide instructions on how to reproduce this figure from the readme? That looks like a very reasonable learning curve – it might help us set things up for other scenarios/methods.

image

We've been looking into the RESCO benchmarks (using their repo, which has all the observation/reward functions and presets they used). Curiously, we also had some trouble reproducing their results, but I intend to look into that as well soon – ideally, I would like to port some of their benchmarks to work on top of SUMO RL (i.e. reimplement the observation spaces, reward functions, etc. that you don't already have), since your package just has incomparably better interfaces – not to mention a better license. 🙂 I will start a pull request when and if that happens. 😄

LucasAlegre commented 2 years ago

If I remember correctly, this figure was generated with the a2c-2way-single-intersection.py script. But I believe the repository changed a lot since then, so I will have to check it again.

I retrieved only the sumo files from the RESCO benchmark, so the results would indeed be different because of the observation, reward, and metric definitions. Implementing them should not be difficult. This also reminds me that being able to select the observation space without having to change the sumo-rl source would also be a very useful feature.

EvanMath commented 2 years ago

and the following results when comparing the second and third episodes:

Hey, I would like to ask, how I can plot only specific episodes? And by episode do you mean an episode on the simulator, i.e 100000 steps?

LucasAlegre commented 2 years ago

Hey, I would like to ask, how I can plot only specific episodes? And by episode do you mean an episode on the simulator, i.e 100000 steps?

The environment outputs a file for every run separately, so you can plot a given run with (e.g. run 1): python outputs/plot.py -f outputs/2way-single-intersection/a3c_conn0_run1.csv By episode I mean a "run" on the simulator until the simulation ends.

EvanMath commented 2 years ago

Do these plots correspond to the training phase? and why they don't have a curve shape decreasing over time? how we can explain these picks? do they relate to different hours? I hope my questions make sense.

LucasAlegre commented 2 years ago

Do these plots correspond to the training phase? and why they don't have a curve shape decreasing over time? how we can explain these picks? do they relate to different hours? I hope my questions make sense.

Yes, they are generated during the training. They generally have, it depends on how fast the agent learns a good policy. There are still stochastic behavior duo to the vehicles behaviors and the flow defined in the route file.