Open krutovsky-danya opened 1 month ago
In SAC pi_loss calculates on Q-functions after grad step. This could lead to wrong gradient steps.
Suggestion: calculate all loses at first, than make optimizers step
In SAC pi_loss calculates on Q-functions after grad step. This could lead to wrong gradient steps.
Suggestion: calculate all loses at first, than make optimizers step