floodsung / DDPG

Reimplementation of DDPG(Continuous Control with Deep Reinforcement Learning) based on OpenAI Gym + Tensorflow
MIT License
554 stars 177 forks source link

Actions generated by Actor network increases to 1. and stay there #6

Open Amir-Ramezani opened 7 years ago

Amir-Ramezani commented 7 years ago

Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.]]

Gradients for that set: [[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05] [ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

Could you tell me what do you think is the problem?

xuyinbo commented 6 years ago

Hello, I have countered the same problem with you. It seems that the values sent to activation function are too large, so the function works in the saturation ares and outputs the action 1. Have you worked this problem out? I am looking forward to your reply, thanks! @Amir-Ramezani