Closed shsjxzh closed 4 years ago
I was confused by this problem once. But it didn't take long for me to realize that it had to do with constants. If we set (1)
with torch.no_grad():
y=x
loss=y*f(x).sum().backward()
(2)
with torch.no_grad():
y=1*x
loss=y*f(x).sum().backward()
(3)
with torch.no_grad():
y=2*x
loss=y*f(x).sum().backward()
(4)
loss=x*f(x).sum().backward()
The gradient x.grad
of these cases will be:
case (1) = case (4)
case (2) *2 = case (3)
About the pytorch version, I have just tested that the gradients calculated by torch 0.4.1, 1.0.1, 1.4.0 are the same.
You are right. This is a weird thing for the auto-differentiation mechanism of pytorch.
From your implementation of CIOU, I think you want to make alpha gradient-free, but this could not be achieved with no_grad() in pytorch 1.4.0.
I will gave you a toy example to show this problem:
From my experiment, net0 and net1 have the same result, but net2 is what we want.