Thanks for the interesting notebooks. Upon studying the "3 - Advantage Actor Critic (A2C) [CartPole].ipynb" notebook, I came to the conclusion that detaching the returns in the update_policy() function is not necessary. The returns are only calculated on the rewards which are environment outputs and therefore not part of the computational graph. So even leaving out the .detach() call should not affect the model. Would you agree?
Hi Ben
Thanks for the interesting notebooks. Upon studying the "3 - Advantage Actor Critic (A2C) [CartPole].ipynb" notebook, I came to the conclusion that detaching the returns in the update_policy() function is not necessary. The returns are only calculated on the rewards which are environment outputs and therefore not part of the computational graph. So even leaving out the .detach() call should not affect the model. Would you agree?