update goal task traning example

HorizonRobotics / SocialRobot

Apache License 2.0

71 stars 20 forks source link

update goal task traning example #78

Closed Jialn closed 4 years ago

Jialn commented 5 years ago

Main Changes:

Update the examples, speed up the training progress
Add optional reward shaping to goal task
Add a simple test for grocery ground with different agent and tasks

Jialn commented 4 years ago

If not interrupted by the error of "action_log_prob had NaN values", it takes about 15 hours (trained with 4 cores i7-6700HQ laptop CPU. GPU is not used because of OOM problem) to reach similar behavior compared to previous AC example, which takes about 3-4days. SAC also takes several days and need a huge size replay buffer, so it was removed.

The curve of grocery_goaltask_img_ppo.gin @ 249322c

The curve of previous AC

Jialn commented 4 years ago

update PPO to std not dependent on state. It has similar performance but does not have numerical instability.

Jialn commented 4 years ago

updated LR, a little bit lower earning rate seems more stable

Jialn commented 4 years ago

This PR is ready to be checked in now. Please approve if there is no more other problems. @emailweixu