DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.25k stars 1.71k forks source link

[bug] SAC: alpha loss uses log_ent_coef inplace of ent_coef #712

Closed akssri-sony closed 2 years ago

akssri-sony commented 2 years ago

https://github.com/DLR-RM/stable-baselines3/blob/c895c1d46f5d24cc49ccb20e99089a141fe7f4c1/stable_baselines3/sac/sac.py#L215

Ref: https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py#L255

araffin commented 2 years ago

duplicate of https://github.com/DLR-RM/stable-baselines3/issues/36

akssri-sony commented 2 years ago

The issue mentioned at the end of #36 is still relevant: the code still uses exp(log_ent_coef) in the soft actor/critic losses. The argument put forward that the objectives have the same minimum is ergo a fallacious.

(rail-berkeley has now been fixed BTW.)