eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
553 stars 149 forks source link

training problem #112

Closed 6Lackiu closed 2 months ago

6Lackiu commented 2 months ago

Hello! I trained the model using python experiments.py evaluate configs/IntersectionEnv/env.json configs/IntersectionEnv/agents/DQNAgent/ego_attention_2h.json --train --episodes=4000, but the training results are not as good as those presented in your paper. What could be the issue? image image

eleurent commented 2 months ago

Note that in my paper, results were averaged across many random seeds, which smoothes the reward vs steps curve, but each runs typically has more variance across steps (each collision creates a large drop, so returns tend to be discontinuous/noisy). But when aggregating over multiple episodes, we should see the same effect that the mean reward moves from ~2 to ~6, which might be the case in your run? Can you try to increase the smoothing on your tensorboard?

If you still think the results are worse, then there might indeed be a regression, which is always possible (I think I made some small changes to the vehicle dynamics, aimed at improving other environments). You can try to sync both repos to some older version from e.g. dec 2019 and run training again.

6Lackiu commented 2 months ago

Thank you for your explanation! I'll continue trying.