I'm trying to run PPO but whenever I try to use StopTrainingOnMaxEpisodes, it gives me the assertion error:
AssertionError: dones variable is not defined, please check your code next to callback.on_step()
The relevant code is:
def _on_step(self) -> bool:
# Check that the `dones` local variable is defined
assert "dones" in self.locals, "`dones` variable is not defined, please check your code next to `callback.on_step()`"
self.n_episodes += np.sum(self.locals["dones"]).item()
continue_training = self.n_episodes < self._total_max_episodes
if self.verbose >= 1 and not continue_training:
mean_episodes_per_env = self.n_episodes / self.training_env.num_envs
mean_ep_str = (
f"with an average of {mean_episodes_per_env:.2f} episodes per env" if self.training_env.num_envs > 1 else ""
)
print(
f"Stopping training with a total of {self.num_timesteps} steps because the "
f"{self.locals.get('tb_log_name')} model reached max_episodes={self.max_episodes}, "
f"by playing for {self.n_episodes} episodes "
f"{mean_ep_str}"
)
return continue_training
I'm not too what this means or how I am supposed to define 'dones' though? Is it in the on_step() function for the BaseCallback() step?
Checklist
[X] I have checked that there is no similar issue in the repo
❓ Question
Hi,
I'm trying to run PPO but whenever I try to use StopTrainingOnMaxEpisodes, it gives me the assertion error: AssertionError:
dones
variable is not defined, please check your code next tocallback.on_step()
The relevant code is:
I'm not too what this means or how I am supposed to define 'dones' though? Is it in the on_step() function for the BaseCallback() step?
Checklist