Khrylx / PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.
MIT License
1.09k stars 186 forks source link

Concatenation of memories with not terminated episode #5

Closed lcswillems closed 6 years ago

lcswillems commented 6 years ago

https://github.com/Khrylx/PyTorch-RL/blob/61960d516c85e912e476b41f764a8a5f8cf38cf8/core/agent.py#L47

Hi,

Thank you for your code that is really well written! From my understanding, mask is 0 at the end of an episode and 1 otherwise. But, there will be a problem if you concatenate a memory M1 (where the last episode is not terminated) with a memory M2 because after concatenation, the computation of returns will be wrong.

To correct this, I think mask should be 0 at the end of an episode or if num_steps = min_batch_size - 1.

Lucas

lcswillems commented 6 years ago

There is no bug at all! Sorry for this wrong issue...