what is the difference between the position of putting "with torch.no_grad()"

Hi, Thank you for your contribution,the code is very useful for me,but I want to ask you a question about this code:

 with torch.no_grad():
        l_pos = torch.index_select(self.memory_v1, 0, y.view(-1))
        l_pos.mul_(momentum)
        l_pos.add_(torch.mul(v1, 1 - momentum))
        l_norm = l_pos.pow(2).sum(1, keepdim=True).pow(0.5)
        updated_v1 = l_pos.div(l_norm)
        self.memory_v1.index_copy_(0, y, updated_v1)
        ab_pos = torch.index_select(self.memory_v2, 0, y.view(-1))
        ab_pos.mul_(momentum)
        ab_pos.add_(torch.mul(v2, 1 - momentum))
        ab_norm = ab_pos.pow(2).sum(1, keepdim=True).pow(0.5)
        updated_v2 = ab_pos.div(ab_norm)
        self.memory_v2.index_copy_(0, y, updated_v2)

In your Implemention of the paper,you calc the loss element first and then update the memory, 
if I update the memory first, and then calc the loss element, what is the difference between these two methods,

Looking forward to your reply！Thank you！

HobbitLong / RepDistiller

what is the difference between the position of putting "with torch.no_grad()" #37