Chapter10 A2C differences

Hi Max and all ,

in Chapter10 code we can find 3 different A2C implementation:

1) 02_pong_a2c.py that is using ptan.agent.PolicyAgent /ptan.experience.ExperienceSourceFirstLast

than we have

2) 03_pong_a2c_rollouts.py that is using ptan.agent.ActorCriticAgent / ptan.experience.ExperienceSourceRollouts

and

3) 04_pong_r2.py not using any ptan agent and not using any ptan.experience class

what are the MAIN differences among the 3 implementation of the A2C ? I dont think that the 3 algo are mentioned in the book, I'd like to better understand differencies (hard only reading the code)

Should are we supposed to get same results level to resolve pong ?

Looking forward to see new revision of the book

Thanks in advance for explanation

Rgds

Dom

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Chapter10 A2C differences #65