araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
284 stars 88 forks source link

Question about frame stacking + command history in SAC hyperparams #36

Closed eliork closed 3 years ago

eliork commented 3 years ago

https://github.com/araffin/learning-to-drive-in-5-minutes/blob/ccb27e66d593d6036fc1076dcec80f74a3f5e239/hyperparams/sac.yml#L16

Hey, from what I see here you didn't stack frames. Have you tried stacking frames? Did it show any improvements in performance(learning-wise in timesteps and wall time)? Another question I had was also if the concatenation of command history improve the learning compared to without using any command history? I am trying both methods but I can't find any significant difference in my experiments. Thank you!

araffin commented 3 years ago

Hello,

Have you tried stacking frames? Did it show any improvements in performance(learning-wise in timesteps and wall time)?

yes I did. It only helps if you have communication delays and if you have a continuity cost and want to dump oscillations (which I have in newest, yet unpublished version of that project ^^")

I'm doing that (frame-stacking + command history at the same time) here: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/sac.yml#L314

Another question I had was also if the concatenation of command history improve the learning compared to without using any command history?

As mentioned in the blog post, this is again for not breaking the markov assumption. It is important to have n_history > 1 (also when you have continuity cost)

eliork commented 3 years ago

Thank you!