Closed Bian-jh closed 2 months ago
loss
(i.e. training the main network) , EntropyBottleneck.quantiles
is unused, and therefore receives no gradients.aux_loss
trains only the EntropyBottleneck.quantiles
(since stop_gradient=True
).The above two losses and their gradient updates are therefore independent. That is, once the gradients are computed (via loss.backward(); aux_loss.backward()
), performing one optimizer step for minimizing loss
and one optimizer step for minimizing aux_loss
is equivalent to performing one optimizer step for minimizing total_loss = loss + aux_loss
. (Assuming the LRs are equal.)
So... yes.
When minimizing
loss
(i.e. training the main network) ,EntropyBottleneck.quantiles
is unused, and therefore receives no gradients.
- Minimizing
aux_loss
trains only theEntropyBottleneck.quantiles
(sincestop_gradient=True
).The above two losses and their gradient updates are therefore independent. That is, performing one optimizer step for minimizing
loss
and one optimizer step for minimizingaux_loss
is equivalent to performing one optimizer step oftotal_loss = loss + aux_loss
. (Assuming the LRs are equal.)So... yes.
Thank you for your immediate reply!!
Can I add aul_loss to rate_distortion_loss as a total loss and then use one optimizer to optimize the entire network?