Closed llinauer closed 2 weeks ago
In the SACPlayer.get_action method, the action is sampled when is_deterministic is True and the mean is taken when it is False. https://github.com/Denys88/rl_games/blob/master/rl_games/algos_torch/players.py#L233C1-L233C67
Maybe I'm wrong, but I would expect the behaviour to be exactly opposite. If we are in the deterministic setting, I would expect to take the mean (which is deterministic) instead of sampling (which is non-deterministic)
Yeah! Thanks for reporting. It might explain strange reports about SAC. cc @ViktorM
Fixed in https://github.com/Denys88/rl_games/pull/291
In the SACPlayer.get_action method, the action is sampled when is_deterministic is True and the mean is taken when it is False. https://github.com/Denys88/rl_games/blob/master/rl_games/algos_torch/players.py#L233C1-L233C67
Maybe I'm wrong, but I would expect the behaviour to be exactly opposite. If we are in the deterministic setting, I would expect to take the mean (which is deterministic) instead of sampling (which is non-deterministic)