Hi,
Thank you for your contribution,the code is very useful for me,but I want to ask you a question about this code:
with torch.no_grad():
l_pos = torch.index_select(self.memory_v1, 0, y.view(-1))
l_pos.mul_(momentum)
l_pos.add_(torch.mul(v1, 1 - momentum))
l_norm = l_pos.pow(2).sum(1, keepdim=True).pow(0.5)
updated_v1 = l_pos.div(l_norm)
self.memory_v1.index_copy_(0, y, updated_v1)
ab_pos = torch.index_select(self.memory_v2, 0, y.view(-1))
ab_pos.mul_(momentum)
ab_pos.add_(torch.mul(v2, 1 - momentum))
ab_norm = ab_pos.pow(2).sum(1, keepdim=True).pow(0.5)
updated_v2 = ab_pos.div(ab_norm)
self.memory_v2.index_copy_(0, y, updated_v2)
In your Implemention of the paper,you calc the loss element first and then update the memory,
if I update the memory first, and then calc the loss element, what is the difference between these two methods,
Looking forward to your reply!Thank you!
Hi, Thank you for your contribution,the code is very useful for me,but I want to ask you a question about this code: