Open murtazabasu opened 4 years ago
Notebooks 1-7 all use Monte Carlo methods. That is each environment is run for a single episode, i.e. until the environment returns done = True
, after which we then calculate the returns/advantages and update the policy parameters.
There is no need to check for done
in the calculation of the returns/advantages as only the last state will have done = True
, which is why R
is initialized to zero.
I'll add the explanation to GAE when I get around to adding more detail to the notebooks - for now I'd recommend these two links:
Hello, thank you for making this repo, I think while calculating the returns you should take done into consideration as,
Also can you please briefly describe the Generalized Advantage Estimation (GAE) while calculating the advantages.