HorizonRobotics / SocialRobot

Apache License 2.0
71 stars 20 forks source link

Update pick place task traning example #118

Closed Jialn closed 4 years ago

Jialn commented 4 years ago

I found the episode length begin to increase at the late training progress, which means it takes longer to successfully place object to the goal.

The reason might be the one discussed in https://github.com/HorizonRobotics/SocialRobot/pull/113#discussion_r354649180

The reward shaping before: in the first stage (gripper get closer to the object) max reward is 1; and the second stage(place object to goal position) max reward is 2; and finally given a 100 when success.

Final reward is 100 and the training example's discount factor gamma is 0.99, the agent will find play around near the goal has higher discounted return: 100 < 2 + 100*0.99 < 2 + 2*0.99 + 100*0.99^2 ...

So I've changed final reward to 200 and gamma to 0.98, to make success immediately is higher than play around the goal's position (200 > 2 + 200*0.98). The result is much better, can successfully place the object to goal in 40 steps on average.

image

Other changes include using the action wrapper which remove 2 redundant dimensions and increase learning rate from 5e-4 to 1e-3: