About the implementation of advantage function in PPO Agent

GFNOrg / gflownet

Generative Flow Networks

MIT License

602 stars 76 forks source link

About the implementation of advantage function in PPO Agent #14

Open yaorong1996 opened 1 year ago

yaorong1996 commented 1 year ago

I find that the implementation in PPOAgent from line 514 in grid/toy_grid_dag.py: adv = r + vsp * (1-d) - vs is only an implementation of the delta term in PPO raw paper. It's not the full term of the advantage function.

Was that a misunderstanding of your code or PPO?