SACPlayer deterministic mixup

Denys88 / rl_games

RL implementations

MIT License

800 stars 135 forks source link

SACPlayer deterministic mixup #289

Closed llinauer closed 2 weeks ago

llinauer commented 3 weeks ago

In the SACPlayer.get_action method, the action is sampled when is_deterministic is True and the mean is taken when it is False. https://github.com/Denys88/rl_games/blob/master/rl_games/algos_torch/players.py#L233C1-L233C67

Maybe I'm wrong, but I would expect the behaviour to be exactly opposite. If we are in the deterministic setting, I would expect to take the mean (which is deterministic) instead of sampling (which is non-deterministic)

Denys88 commented 3 weeks ago

Yeah! Thanks for reporting. It might explain strange reports about SAC. cc @ViktorM

llinauer commented 2 weeks ago

Fixed in https://github.com/Denys88/rl_games/pull/291