Closed rodrigogutierrezm closed 3 years ago
Hi, yes it was rl-agents' implementation and hyperparams. I believe it was trained on about 5k episodes (I should really make this part of the agent configuration)
Ok, just to clariffy, you modified the hiperparams. To replicate your results some modifications in these params are required, right?
Thank you.
No, I do not think that I changed the hyperparameters, I mostly refactored the file structure. I will try to run it again and see if I can reproduce the results.
Perfect, thank you.
So I ran a run with the current dueling_ddqn.json
config for 1.5k episodes, and got these results:
They seem worse than what I had in May 2019 (though it is hard to check on a single run).
The corresponding behaviors are reasonable, but still have quite a high number of collisions:
https://user-images.githubusercontent.com/1706935/115575465-df618f00-a2c2-11eb-8fb5-ffbe8ce573e0.mp4
https://user-images.githubusercontent.com/1706935/115575489-e4264300-a2c2-11eb-8c60-d7490f752d2a.mp4
https://user-images.githubusercontent.com/1706935/115575522-eb4d5100-a2c2-11eb-8330-b3a1758e1821.mp4
I checked for differences in the configurations, and noticed that:
I will try again with the previous values, to see if there's a difference.
Ok, I am still trying to reach those results, thank you for your help, as soon as I get a good model I will let you know.
How do you get the episode/return graphic? Thanks
Through tensorboard. If you have it installed, you can run
tensorboard --logdir <rl-agents path>/scripts/out/HighwayEnv/DQNAgent/
This will spawn a web server allowing you to visualize your runs (mostly rewards and network architecture for now, but I should add other metrics, such as average Q-values in the sampled minibatch or initial state).
Thank you
I found that that there is indeed a regression in performance, but it is due to changes in the environment (highway-env) rather than agent (rl-agents). See this chart:
It seems that the environment has become more difficult to solve, though I do not know why. This could be due to changes
it seems that 1. has not really changed, 2. has a little bit, and 3. has a minor change.
I will investigate, and maybe even git bisect if i cannot find any meaningful difference in the code.
I found out why the current version of highway-env is more difficult than it used to:
which explains why the agent tends to get more collisions.
This is due to the speed limit of the road, set to 20 m/s (by default), where 30 m/s would be more appropriate. I will restore this value.
Thank you for all the information. I have been able to reproduce the training and the charts. In order to test the results i run
python3 experiments.py evaluate configs/HighwayEnv/env.json configs/HighwayEnv/agents/DQNAgent/dqn.json --test --episodes=10
But the performance of the ego vehicle is not good. I am not sure if I am using the trained model, is there a way to specify the model to be used?
You must simply add the --recover
option , or --recover-from=path/to/model.tar
, to load a trained model before evaluating.
The --recover
option loads scripts/out/<Env>/<Agent>/saved_models/latest.tar
by default (which is updated during training)
Hello, I was able to replicate your results. One last question, when you select an agent as dueling_ddqn, in the model a type is defined ("DuelingNetwork). Where does this type is created? Thank you very much.
Thank you for everything.
Hello, thank you for sharing this great job. I am trying to replicate the behaviour shown in the examples (Deep Q-Network). Have you trained with the network provided in the rl-agents? I have tried it with 1000 episodes and when I test it, the agent only moves to the right. Maybe more episodes are needed.
Thank you in advance.