A wrong implementation of CIOU?

shsjxzh commented 4 years ago

From your implementation of CIOU, I think you want to make alpha gradient-free, but this could not be achieved with no_grad() in pytorch 1.4.0.

with torch.no_grad():
    S = 1 - iou
    alpha = v / (S + v)
cious = iou - (u + alpha * v)

I will gave you a toy example to show this problem:

import torch
from torch import nn
input_num=3

class net0(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(input_num, 1)

    def forward(self, x):
        x = self.fc(x)
        y = x
        x = y * x
        return x

class net1(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(input_num, 1)

    def forward(self, x):
        x = self.fc(x)
        with torch.no_grad():
            y = x
        x = y * x
        return x

class net2(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(input_num, 1)

    def forward(self, x):
        x = self.fc(x)
        y = x.item()
        x = y * x
        return x

for i in range(3):
    if i == 0:
        model = net0()
    elif i == 1:
        model = net1()
    else:
        model = net2()

    for m in model.parameters():
        m.data.fill_(0.1)

    criterion = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=1.0)

    data = torch.ones((1, input_num), dtype=torch.float)
    target = 0.5 * torch.ones((1, 1))

    optimizer.zero_grad()
    output = model(data)
    # print(output)
    loss = criterion(output, target)
    loss.backward()
    print("i: {}".format(i))
    print(model.fc.weight.grad)

From my experiment, net0 and net1 have the same result, but net2 is what we want.

Zzh-tju commented 4 years ago

I was confused by this problem once. But it didn't take long for me to realize that it had to do with constants. If we set (1)

with torch.no_grad():
    y=x
loss=y*f(x).sum().backward()

(2)

with torch.no_grad():
    y=1*x
loss=y*f(x).sum().backward()

(3)

with torch.no_grad():
    y=2*x
loss=y*f(x).sum().backward()

(4)

loss=x*f(x).sum().backward()

The gradient x.grad of these cases will be: case (1) = case (4) case (2) *2 = case (3)

About the pytorch version, I have just tested that the gradients calculated by torch 0.4.1, 1.0.1, 1.4.0 are the same.

shsjxzh commented 4 years ago

You are right. This is a weird thing for the auto-differentiation mechanism of pytorch.

Zzh-tju / DIoU-SSD-pytorch

A wrong implementation of CIOU? #19