WellyZhang / mx-DDPG

MXNet Implementation of DDPG
0 stars 1 forks source link

For task Pendulum-v0. #1

Open ShawnLue opened 7 years ago

ShawnLue commented 7 years ago

Hi, I have used your code to solve another continuous control task in openai/gym, Pendulum-v0. However, the result was quite bad. I didn't use the rllab environment, just using the simple gym with some modification on your code. Could you give me some insight why the reward was not good?

ShawnLue commented 7 years ago

Also, I find the AverageQLoss of critic network is monotonically decreasing (can be down to 1e-2 after nearly 80 epochs), while the actor network is still performing bad, which may not be reasonable. Do you have some experience on debugging DDPG? Thanks.

WellyZhang commented 7 years ago

Hi Shawn,

Sorry I've not tried Gym, but there might be some details you shoud pay attention to.

One is the scale of the action. The output of the policy network is squashed to [-1, 1] and this might not correspond to the scale of action in the Pendulum-v0 enviroment.

Another is hyperparameters. When the implementation doesn't work, try use different learning rates and adjust other hyperparameters.

Third, initialization. I do notice that different initializations yield drastically different results in my experiment with CartPole.

I do not guarantee that it will work similarly well on every task. But after tuning, you might have decent performance.

ShawnLue commented 7 years ago

Hi Welly

Thanks for your reply. There seems a little bug on the target update and I have made a pull request.

Also, my experiments on gym finally work well after fixing it, tuning the network structure, and choosing appropriate hyper-parameters. Your suggestion is very helpful.

Thanks

WellyZhang commented 7 years ago

Thank you Shawn for the pull request. I'm sorry for the bug.