datawhalechina / easy-rl

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
Other
9.36k stars 1.86k forks source link

PPO advantage calculation #114

Closed XinXU-USTC closed 1 year ago

XinXU-USTC commented 2 years ago

I think that in ppo2.py line119-122, we need to assert "if dones_arr[k]: break" into the for loop. That is because there are data from different episodes in the memory. Is it right?

johnjim0816 commented 1 year ago

I think that in ppo2.py line119-122, we need to assert "if dones_arr[k]: break" into the for loop. That is because there are data from different episodes in the memory. Is it right?

yes