Why does sac training stop after train_freq steps? [question]

tleyden commented 5 years ago

I noticed that in this code it resets the environment after hitting train_freq steps: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/algos/custom_sac.py#L122-L126

whereas in the baseline implementation, it does not:

https://github.com/hill-a/stable-baselines/blob/fddf169875154f6129071045f0a6f99614c490a5/stable_baselines/sac/sac.py#L416-L434

                if step % self.train_freq == 0:
                    mb_infos_vals = []
                    # Update policy, critics and target networks
                    for grad_step in range(self.gradient_steps):
                        if self.num_timesteps < self.batch_size or self.num_timesteps < self.learning_starts:
                            break
                        n_updates += 1
                        # Compute current learning_rate
                        frac = 1.0 - step / total_timesteps
                        current_lr = self.learning_rate(frac)
                        # Update policy and critics (q functions)
                        mb_infos_vals.append(self._train_step(step, writer, current_lr))
                        # Update target network
                        if (step + grad_step) % self.target_update_interval == 0:
                            # Update target network
                            self.sess.run(self.target_update_op)
                    # Log losses and entropy, useful for monitor training
                    if len(mb_infos_vals) > 0:
                        infos_values = np.mean(mb_infos_vals, axis=0)

I was surprised to see that during training on a track it reset even though it was doing well, and it seemed to be because of this code, since I noticed the "Additional training" log output line.

I'm curious, what is the reasoning behind the env.reset() here?

araffin commented 5 years ago

Hello,

This is a hack to keep training from time to time, otherwise, as this custom sac version only trains after each reset (each end of episode), it won't train until the end of an episode. You can remove that or set a high "train_freq" so it does not happen.

tleyden commented 5 years ago

Makes sense, thanks!

araffin / learning-to-drive-in-5-minutes

Why does sac training stop after train_freq steps? [question] #6