AI4Finance-Foundation / ElegantRL

Massively Parallel Deep Reinforcement Learning. 🔥
https://ai4finance.org
Other
3.77k stars 852 forks source link

elegantRL_helloworld 和elegantRL下SAC算法收敛速度差别比较大 #115

Closed ba277351771 closed 2 years ago

ba277351771 commented 2 years ago

elegantRL_helloworld image

elegantRL image

hmomin commented 2 years ago

您是否运行了 Elegantrl_helloworld 超过 1.36e+05 步? 我注意到即使是优雅的版本在达到 1.5e+05 步之前的性能也很差......

ba277351771 commented 2 years ago

image

ba277351771 commented 2 years ago

换了个seed helloworld版还是负分

image
ba277351771 commented 2 years ago

image image

Yonv1943 commented 2 years ago

I had found this bug:

Solution:

class AgentSAC:
    def get_obj_critic_raw(...):
        ...
        next_a, next_log_prob = self.act_target.get_action_logprob(next_s)
        ...

should be

next_a, next_log_prob = self.act.get_action_logprob(next_s)

Why? Because SAC has not target network for actor. So I cancel the soft_update in update_net.

class AgentSAC:
    def update_net(...):
            # self.soft_update(self.act_target, self.act, self.soft_update_tau) # SAC don't use act_target network
hmomin commented 2 years ago

Jiahao is right, I can confirm that it fixes the problem (see commit f85de42). It's not perfect, but at least the agent trains properly now...

bipedal-walker-SAC