Open ShawnLue opened 7 years ago
Also, I find the AverageQLoss of critic network is monotonically decreasing (can be down to 1e-2 after nearly 80 epochs), while the actor network is still performing bad, which may not be reasonable. Do you have some experience on debugging DDPG? Thanks.
Hi Shawn,
Sorry I've not tried Gym, but there might be some details you shoud pay attention to.
One is the scale of the action. The output of the policy network is squashed to [-1, 1] and this might not correspond to the scale of action in the Pendulum-v0 enviroment.
Another is hyperparameters. When the implementation doesn't work, try use different learning rates and adjust other hyperparameters.
Third, initialization. I do notice that different initializations yield drastically different results in my experiment with CartPole.
I do not guarantee that it will work similarly well on every task. But after tuning, you might have decent performance.
Hi Welly
Thanks for your reply. There seems a little bug on the target update and I have made a pull request.
Also, my experiments on gym finally work well after fixing it, tuning the network structure, and choosing appropriate hyper-parameters. Your suggestion is very helpful.
Thanks
Thank you Shawn for the pull request. I'm sorry for the bug.
Hi, I have used your code to solve another continuous control task in openai/gym, Pendulum-v0. However, the result was quite bad. I didn't use the rllab environment, just using the simple gym with some modification on your code. Could you give me some insight why the reward was not good?