Cattharine / product_owner_rl

0 stars 2 forks source link

Validate grad step order in SAC #66

Open krutovsky-danya opened 1 month ago

krutovsky-danya commented 1 month ago

In SAC pi_loss calculates on Q-functions after grad step. This could lead to wrong gradient steps.

Suggestion: calculate all loses at first, than make optimizers step