Denys88 / rl_games

RL implementations
MIT License
941 stars 155 forks source link

advantage in a2c_common #250

Closed mxllc closed 1 year ago

mxllc commented 1 year ago

https://github.com/Denys88/rl_games/blob/990b4782ad0375652af76266a12753cb11d768c6/rl_games/common/a2c_common.py#L721-L722

Why does advantage calculated by _discountvalues in #722? Shouldn't the returns be calculated through the _discountvalues?

The image and link below are from an implementation I saw in another repository about A2C. I'm a bit confused about this. Does anyone know what's going on?"

image

https://github.com/Skylark0924/Machine-Learning-is-ALL-You-Need/blob/766a50ba07c21f6e9f6c8c48a819f6e075e97b78/RL_Actor_Critic/17Actor_Critic.py#L96

mxllc commented 1 year ago

Sorry, I found that in the discount_value function in the link below, it uses Generalized Advantage Estimation to calculate the advantage. https://github.com/Denys88/rl_games/blob/990b4782ad0375652af76266a12753cb11d768c6/rl_games/common/a2c_common.py#L536-L537