HorizonRobotics / SocialRobot

Apache License 2.0
71 stars 20 forks source link

WIP: add icub alf sac example #59

Closed Jialn closed 5 years ago

Jialn commented 5 years ago

(Test result, gin file is @28af537) Agent could keep not falling up to 200 steps in 800-900K env steps, pretty faster. But reward is lower, just keep standing, does not walking forward. And the performance goes down periodically. As to the speed, Alf SAC 4.5M env steps cost ~ 2days, about 3x slower than tf-agent SAC.

Alf SAC, gin file @28af537

image

tf-agent SAC, the ICubWalkDefault one

image
Jialn commented 5 years ago

performance also goes down periodically in case of PPO.

image
emailweixu commented 5 years ago

My training using ppo_icubwalk.gin in alf repo is stable.

My experience so far is that state_dependent_std=True makes the training unstable. I am still investigating the reason.