Closed libo-huang closed 2 years ago
Hi @HLBayes, I'm sorry for the very late reply!
This difference between the two repositories is indeed confusing. In both repositories, the .detach()
operation is actually not needed. As you indicate in your comment, the .detach()
operation stops backpropagation as it resets any gradients being tracked. However, in both repositories, the target_norm
variable already did not have any gradients being tracked, as that variable was computed using with torch.no_grad():
(as for example here: link). Sorry for the confusion!
Many thanks for your impressive project. Here I am a few confused about the
.detach()
in the below code, https://github.com/GMvandeVen/continual-learning/blob/a02db26d3b10754abdc4a549bdcde6af488c94e0/utils.py#L35which is defined in https://github.com/GMvandeVen/continual-learning/blob/a02db26d3b10754abdc4a549bdcde6af488c94e0/utils.py#L18
Refer to the blog, PyTorch .detach() method ,
.detach()
will take thetargets_norm
as one fixed parameter in the theKD_loss
, and the backpropagation will not update the parameters along thetargets_norm
related branch.However, in your another project, brain-inspired-replay, the same loss function,
loss_fn_kd
uses,as shown in line 29, in which no
.detach()
is attached.Although the same results all these two types I have tested, I am still confused about how does the second type work?