in Chapter10 code we can find 3 different A2C implementation:
1) 02_pong_a2c.py that is using ptan.agent.PolicyAgent /ptan.experience.ExperienceSourceFirstLast
than we have
2) 03_pong_a2c_rollouts.py that is using ptan.agent.ActorCriticAgent / ptan.experience.ExperienceSourceRollouts
and
3) 04_pong_r2.py not using any ptan agent and not using any ptan.experience class
what are the MAIN differences among the 3 implementation of the A2C ?
I dont think that the 3 algo are mentioned in the book, I'd like to better understand differencies
(hard only reading the code)
Should are we supposed to get same results level to resolve pong ?
Hi Max and all ,
in Chapter10 code we can find 3 different A2C implementation:
1) 02_pong_a2c.py that is using ptan.agent.PolicyAgent /ptan.experience.ExperienceSourceFirstLast
than we have
2) 03_pong_a2c_rollouts.py that is using ptan.agent.ActorCriticAgent / ptan.experience.ExperienceSourceRollouts
and
3) 04_pong_r2.py not using any ptan agent and not using any ptan.experience class
what are the MAIN differences among the 3 implementation of the A2C ? I dont think that the 3 algo are mentioned in the book, I'd like to better understand differencies (hard only reading the code)
Should are we supposed to get same results level to resolve pong ?
Looking forward to see new revision of the book
Thanks in advance for explanation
Rgds
Dom