I am not sure where I may be going wrong. I just copy pasted the a2c_continuous.py file and even after 3000 episodes the 10 episode average reward has converged from -133 to -2 or something. It doesnt even cross 0 , can you please let me know how did you manage to converge this to +100 in the same number of episodes ?
When I run it, after some time it keeps bouncing between -10 and -2.
I also tried the a3c_continuous.py and same happens there as well.
Hello,
I am not sure where I may be going wrong. I just copy pasted the a2c_continuous.py file and even after 3000 episodes the 10 episode average reward has converged from -133 to -2 or something. It doesnt even cross 0 , can you please let me know how did you manage to converge this to +100 in the same number of episodes ?
When I run it, after some time it keeps bouncing between -10 and -2.
I also tried the a3c_continuous.py and same happens there as well.
Thanks