AI4Finance-Foundation / ElegantRL

Massively Parallel Deep Reinforcement Learning. 🔥
https://ai4finance.org
Other
3.54k stars 818 forks source link

SAC alpha update problem #346

Open Shapeno opened 4 months ago

Shapeno commented 4 months ago

In obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean() when alpha_log=0, alpha will be 1forever. the correct way is obj_alpha = (self.alpha * (self.target_entropy - log_prob).detach()).mean() .

this problem is also found in rlkit.

Algorithm details in the source code of : https://github.com/rail-berkeley/softlearning/blob/13cf187cc93d90f7c217ea2845067491c3c65464/softlearning/algorithms/sac.py#L256

Shapeno commented 4 months ago

https://github.com/AI4Finance-Foundation/ElegantRL/blob/b4b9d662b9f9cb7cc368ac2b1036b5119eb20be4/elegantrl/agents/AgentSAC.py#L48C13-L48C23