Why does the backup for the cost critic loss assign data['rew'] instead of data['cost'] to cost? Wouldn't this update result in a cost critic identical to the standard value critic?
The initial update for the Lagrange multiplier uses Jc = data['cost'].sum().item(). However, the update_lagrange_multiplier method uses Jc to compute the lambda loss which has function signature: def compute_lambda_loss(self, mean_ep_cost): Shouldn't Jc be defined as Jc = data['cost'].mean().item() if it's the mean_ep_cost?
You are very attentive and thank you very much for your suggestions. We have fixed this issue on the dev branch. If you find a new issue in subsequent use, we will fix it as soon as possible.
Required prerequisites
Questions
Why does the backup for the cost critic loss assign
data['rew']
instead ofdata['cost']
tocost
? Wouldn't this update result in a cost critic identical to the standard value critic?The initial update for the Lagrange multiplier uses
Jc = data['cost'].sum().item()
. However, theupdate_lagrange_multiplier
method usesJc
to compute the lambda loss which has function signature:def compute_lambda_loss(self, mean_ep_cost):
Shouldn'tJc
be defined asJc = data['cost'].mean().item()
if it's themean_ep_cost
?