The total reward of this A2C is very small after 2 or 3 thousands episodes

floodsung / a2c_cartpole_pytorch

advantage actor-critic reinforcement learning for openai gym cartpole

64 stars 12 forks source link

Open yanshuok opened 6 years ago

yanshuok commented 6 years ago

I wrote an A2C have the same problem, is the problem of A2C?

MISTCARRYYOU commented 4 years ago

I obtained the same problem, the original result of this code is confusing. The rewards will be around 10.0, smaller than forward episodes.