bentrevett / pytorch-rl

Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
MIT License
264 stars 77 forks source link

3 - Advantage Actor Critic (A2C) [CartPole].ipynb - Returns do not need to be detached #3

Open nimrare opened 3 years ago

nimrare commented 3 years ago

Hi Ben

Thanks for the interesting notebooks. Upon studying the "3 - Advantage Actor Critic (A2C) [CartPole].ipynb" notebook, I came to the conclusion that detaching the returns in the update_policy() function is not necessary. The returns are only calculated on the rewards which are environment outputs and therefore not part of the computational graph. So even leaving out the .detach() call should not affect the model. Would you agree?