Closed akssri-sony closed 2 years ago
The issue mentioned at the end of #36 is still relevant: the code still uses exp(log_ent_coef) in the soft actor/critic losses. The argument put forward that the objectives have the same minimum is ergo a fallacious.
(rail-berkeley has now been fixed BTW.)
https://github.com/DLR-RM/stable-baselines3/blob/c895c1d46f5d24cc49ccb20e99089a141fe7f4c1/stable_baselines3/sac/sac.py#L215
Ref: https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py#L255