Closed d505 closed 4 months ago
If the episode length is constant, and n_step is greater than the episode length, then yes, the rollout will contain data from several episodes. In fact, as explained in the issue you linked, this allows for more stable updates. Why do you want to reduce n_step so that this isn't the case?
Thanks for the reply. I thought it was a problem in calculating the advantage. I understand that Advantage is similar to cumulative rewards. It seemed to me that the reward for an action in another episode is not related to the action in the current episode. I thought the same was true for the Advantage calculation. https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/buffers.py#L425-L434
I'm not 100% sure I understand what you mean, but if the question is whether the advantage calculation takes into account the end of an episode, i.e. whether the summation stops with the end of the episode, then yes, it does. Additionally for the TD(lambda) estimator, you can check #375 or "Telescoping in TD(lambda)" in David Silver Lecture 4: https://www.youtube.com/watch?v=PnHCvfgC_ZA
Thank you. Maybe I didn't understand code well enough. So every time the episode changes, 1 is set in self.episode_starts[] and the advantage is reset.
❓ Question
Hello!
I have a question about n_step and the relationship between episodes and advantage in episodic tasks. I have an episodic task that ends with the same step every time. And I use PPO. If n_step is greater than episode length, I believe that the advantage function will take into account and compute for the next episode as well. What do you think is actually the case? Then I would prefer n_step equal to episode length without including other episodes.
The following piece of code. https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/buffers.py#L402
On the other hand, other questions seemed to suggest that including other episodes would be a good idea. https://github.com/DLR-RM/stable-baselines3/issues/560
Checklist