Closed tleyden closed 5 years ago
Hello,
This is a hack to keep training from time to time, otherwise, as this custom sac version only trains after each reset (each end of episode), it won't train until the end of an episode. You can remove that or set a high "train_freq" so it does not happen.
Makes sense, thanks!
I noticed that in this code it resets the environment after hitting train_freq steps: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/algos/custom_sac.py#L122-L126
whereas in the baseline implementation, it does not:
https://github.com/hill-a/stable-baselines/blob/fddf169875154f6129071045f0a6f99614c490a5/stable_baselines/sac/sac.py#L416-L434
I was surprised to see that during training on a track it reset even though it was doing well, and it seemed to be because of this code, since I noticed the "Additional training" log output line.
I'm curious, what is the reasoning behind the
env.reset()
here?