Open whynpt opened 2 years ago
actor_loss = -self.critic_network(inputs_norm_tensor, actions_real).mean() actor_loss += self.args.action_l2 * (actions_real / self.env_params['action_max']).pow(2).mean()
I think the output of critic_network is enough to be the actor_loss. So is it a regularizer or trick? it would be better for me to reply in Chinese.
@whynpt It's more like a regularizer, make sure the action will not move too much.
I think the output of critic_network is enough to be the actor_loss. So is it a regularizer or trick? it would be better for me to reply in Chinese.