Training with DQN - Githubissues

rodrigogutierrezm commented 3 years ago

Hello, thank you for sharing this great job. I am trying to replicate the behaviour shown in the examples (Deep Q-Network). Have you trained with the network provided in the rl-agents? I have tried it with 1000 episodes and when I test it, the agent only moves to the right. Maybe more episodes are needed.

Thank you in advance.

eleurent commented 3 years ago

Hi, yes it was rl-agents' implementation and hyperparams. I believe it was trained on about 5k episodes (I should really make this part of the agent configuration)

rodrigogutierrezm commented 3 years ago

Ok, just to clariffy, you modified the hiperparams. To replicate your results some modifications in these params are required, right?

Thank you.

eleurent commented 3 years ago

No, I do not think that I changed the hyperparameters, I mostly refactored the file structure. I will try to run it again and see if I can reproduce the results.

eleurent commented 3 years ago

rodrigogutierrezm commented 3 years ago

Perfect, thank you.

eleurent commented 3 years ago

So I ran a run with the current dueling_ddqn.json config for 1.5k episodes, and got these results:

They seem worse than what I had in May 2019 (though it is hard to check on a single run).

The corresponding behaviors are reasonable, but still have quite a high number of collisions:

https://user-images.githubusercontent.com/1706935/115575465-df618f00-a2c2-11eb-8fb5-ffbe8ce573e0.mp4

https://user-images.githubusercontent.com/1706935/115575489-e4264300-a2c2-11eb-8c60-d7490f752d2a.mp4

https://user-images.githubusercontent.com/1706935/115575522-eb4d5100-a2c2-11eb-8330-b3a1758e1821.mp4

I checked for differences in the configurations, and noticed that:

the dueling architecture has changed from
- a shared base network with [256, 256] hidden layers and two linear heads (value and advantage)
- a shared base network with [256,128] hidden layers + an additional [128] hidden layer for each head (value and advantage)
the learning rate is not specified in the agent configuration, so the default value is used, and it has been changed from 5e-4 to 1e-3.

I will try again with the previous values, to see if there's a difference.

RobeSafe-UAH commented 3 years ago

Ok, I am still trying to reach those results, thank you for your help, as soon as I get a good model I will let you know.

RobeSafe-UAH commented 3 years ago

How do you get the episode/return graphic? Thanks

eleurent commented 3 years ago

Through tensorboard. If you have it installed, you can run

tensorboard --logdir <rl-agents path>/scripts/out/HighwayEnv/DQNAgent/

This will spawn a web server allowing you to visualize your runs (mostly rewards and network architecture for now, but I should add other metrics, such as average Q-values in the sampled minibatch or initial state).

RobeSafe-UAH commented 3 years ago

Thank you

eleurent commented 3 years ago

I found that that there is indeed a regression in performance, but it is due to changes in the environment (highway-env) rather than agent (rl-agents). See this chart:

Orange is the run I launched in May 2019 (i still had it on my disk)
Blue is the run I launched yesterday, with current version of rl-agents and highway-env
Gray is a run I launched this morning, with current version of highway-env and old hyperparameters of rl-agents (does not seem to be changing much)
Finally, red is a run with current version of rl-agents (& old hyperparams), but with the version of highway-env from May 2019

It seems that the environment has become more difficult to solve, though I do not know why. This could be due to changes

in the vehicle dynamics?
in the vehicles initialization / density?
in the reward function?

it seems that 1. has not really changed, 2. has a little bit, and 3. has a minor change.

I will investigate, and maybe even git bisect if i cannot find any meaningful difference in the code.

eleurent commented 3 years ago

I found out why the current version of highway-env is more difficult than it used to:

previously, other vehicles were initialized with a target speed of [23-25]
now, they are much slower, with a target speed of [14-20].

which explains why the agent tends to get more collisions.

This is due to the speed limit of the road, set to 20 m/s (by default), where 30 m/s would be more appropriate. I will restore this value.

rodrigogutierrezm commented 3 years ago

Thank you for all the information. I have been able to reproduce the training and the charts. In order to test the results i run

python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/dqn.json --test --episodes=10

But the performance of the ego vehicle is not good. I am not sure if I am using the trained model, is there a way to specify the model to be used?

eleurent commented 3 years ago

You must simply add the --recover option , or --recover-from=path/to/model.tar, to load a trained model before evaluating. The --recover option loads scripts/out/<Env>/<Agent>/saved_models/latest.tar by default (which is updated during training)

rodrigogutierrezm commented 3 years ago

Hello, I was able to replicate your results. One last question, when you select an agent as dueling_ddqn, in the model a type is defined ("DuelingNetwork). Where does this type is created? Thank you very much.

eleurent commented 3 years ago

Here: https://github.com/eleurent/rl-agents/blob/a290be38351cf29c03779cb6683d831a06b74864/rl_agents/agents/common/models.py#L79

rodrigogutierrezm commented 3 years ago

Thank you for everything.

Farama-Foundation / HighwayEnv

Training with DQN #183