Closed Jialn closed 5 years ago
performance also goes down periodically in case of PPO.
My training using ppo_icubwalk.gin in alf repo is stable.
My experience so far is that state_dependent_std=True makes the training unstable. I am still investigating the reason.
(Test result, gin file is @28af537) Agent could keep not falling up to 200 steps in 800-900K env steps, pretty faster. But reward is lower, just keep standing, does not walking forward. And the performance goes down periodically. As to the speed, Alf SAC 4.5M env steps cost ~ 2days, about 3x slower than tf-agent SAC.
Alf SAC, gin file @28af537
tf-agent SAC, the ICubWalkDefault one