Closed ba277351771 closed 2 years ago
您是否运行了 Elegantrl_helloworld 超过 1.36e+05 步? 我注意到即使是优雅的版本在达到 1.5e+05 步之前的性能也很差......
换了个seed helloworld版还是负分
I had found this bug:
Solution:
class AgentSAC:
def get_obj_critic_raw(...):
...
next_a, next_log_prob = self.act_target.get_action_logprob(next_s)
...
should be
next_a, next_log_prob = self.act.get_action_logprob(next_s)
Why?
Because SAC has not target network for actor.
So I cancel the soft_update
in update_net
.
class AgentSAC:
def update_net(...):
# self.soft_update(self.act_target, self.act, self.soft_update_tau) # SAC don't use act_target network
Jiahao is right, I can confirm that it fixes the problem (see commit f85de42). It's not perfect, but at least the agent trains properly now...
elegantRL_helloworld
elegantRL