Gradint explosion in CIOU loss - Githubissues

Zzh-tju / DIoU-SSD-pytorch

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)

GNU General Public License v3.0

87 stars 24 forks source link

Gradint explosion in CIOU loss #4

Closed Crazod closed 4 years ago

Crazod commented 4 years ago

HI, zhao hui: I try DIOU loss in Retinanet and it can converge well. But when i try CIOU loss in Retinanet, it can not converge stable. I use your code and have an question ？The equation (12) in your paper. You have replace many variable，is it work alwayes stable?

Zzh-tju commented 4 years ago

Yes, training is stable so far. In Eqn.12 we remove w^2+h^2 and alpha has no gradient backward.

Crazod commented 4 years ago

Yes, training is stable so far. In Eqn.12 we remove w^2+h^2 and alpha has no gradient backward.

Hi, i have an question, paper say dominator w^2 and h^2 is usually a small value for the cases h and w ranging in [0, 1]. I wonder that the variable "w" is the normalize box width by image. Or the real width in image. i use the w is the width of the real bbox. Is it wrong?

Zzh-tju commented 4 years ago

x,y,w,h will normalize to [0,1] so as to balance with cls loss. And did you print 'w' on the terminal？

Crazod commented 4 years ago

Oh Thank u. I think i know why. My w is a real variable, like w = 0 ~ 1080(my image width) 。But your w is the normalize variable in your code. https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L91。 So when i implentment your code in my project. The w is not a variable [0, 1] but a variable [0, max width] in origin image。Is it true? so i change the code:

    with torch.no_grad():
        arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1)
        v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2)
        S = 1 - iou
        alpha = v / (S + v)
        w_temp = 2 * w1
        distance = w1 ** 2 + h1 **2

    ar = (8 / (math.pi ** 2)) * arctan * (w1 - w_temp) * h1
    cious = iou - (u + alpha * ar / distance)

And now it can converge in a small dataset. I need time to read the whole project. And i will test coco tonight. By the way, I'm working in SenseTime in Shanghai. If you would like find for a Intern Job in Beijing or Shanghai. Contact with me anytime. Wangliming@sensetime.com

Zzh-tju commented 4 years ago

haha, thank u in advance. I have one and a half years to graduate.

ranjiewwen commented 4 years ago

Oh Thank u. I think i know why. My w is a real variable, like w = 0 ~ 1080(my image width) 。But your w is the normalize variable in your code. https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L91。 So when i implentment your code in my project. The w is not a variable [0, 1] but a variable [0, max width] in origin image。Is it true? so i change the code:
    with torch.no_grad():
        arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1)
        v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2)
        S = 1 - iou
        alpha = v / (S + v)
        w_temp = 2 * w1
        distance = w1 ** 2 + h1 **2

    ar = (8 / (math.pi ** 2)) * arctan * (w1 - w_temp) * h1
    cious = iou - (u + alpha * ar / distance)
And now it can converge in a small dataset. I need time to read the whole project. And i will test coco tonight.

thanks for you share, why use the this distance : distance = w1 ** 2 + h1 **2 ? normalize w1*h1 ?

Zzh-tju commented 4 years ago

No, we did not. On the contrary, we removed w²+h².

Crazod commented 4 years ago

In Zzh's code，The original code removed w^2 + h^2 because the use the normalize distance. The range of w is (0, 1). But I just want to use the diou loss in my own project. All my distance in my own project is the real distance like w is the range of (0, width). So I normaize the w and h by a distance. I think the distance is not very important just a real number for avoiding gradint expolision.

ranjiewwen commented 4 years ago

@Crazod @Zzh-tju thanks , i got it . but i use diou or ciou loss in my code , it not better than giou loss in mscoco. maybe something i get miss. so i want to run you source code, but i get this error : File "tools/train.py", line 287, in <module> train() File "tools/train.py", line 188, in train loss.backward() File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Zzh-tju commented 4 years ago

maybe environment problem I'm not sure

psyanglll commented 4 years ago

@ranjiewwen I got the same result.Giou's result is better than Ciou. have you try Diou?

psyanglll commented 4 years ago

I've tried Diou. It works very well ! The result is much more Better than ciou and giou!

ranjiewwen commented 4 years ago

@psyanglll i am not get better result, have you run the source code , meet this problem RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation ?

psyanglll commented 4 years ago

@ranjiewwen I just used the Diou in my own SSD .And got the better result than giou and ciou. As for your problem. I guess it's wrong version of Pytorch.

ranjiewwen commented 4 years ago

@ranjiewwen I just used the Diou in my own SSD .And got the better result than giou and ciou. As for your problem. I guess it's wrong version of Pytorch.

thanks, i use pytorch0.4.1 the problem solved.

XiaSunny commented 4 years ago

@psyanglll 　How much performance has been improved when Diou in your own SSD? In my SSD with Diou,the performance descend ，my pytorch is 0.3.Thank you!

Libaishun commented 4 years ago

Oh Thank u. I think i know why. My w is a real variable, like w = 0 ~ 1080(my image width) 。But your w is the normalize variable in your code. https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L91。 So when i implentment your code in my project. The w is not a variable [0, 1] but a variable [0, max width] in origin image。Is it true? so i change the code:
    with torch.no_grad():
        arctan = torch.atan(w2 / h2) - torch.atan(w1 / h1)
        v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2)
        S = 1 - iou
        alpha = v / (S + v)
        w_temp = 2 * w1
        distance = w1 ** 2 + h1 **2

    ar = (8 / (math.pi ** 2)) * arctan * (w1 - w_temp) * h1
    cious = iou - (u + alpha * ar / distance)
And now it can converge in a small dataset. I need time to read the whole project. And i will test coco tonight. By the way, I'm working in SenseTime in Shanghai. If you would like find for a Intern Job in Beijing or Shanghai. Contact with me anytime. Wangliming@sensetime.com

I also work in real variable and change the code as you post , I then train on coco dataset, found the cious loss decrease extremely slow

PaulZhangIsing commented 3 years ago

@Zzh-tju I have tried the diou/ciou loss on Keras, with VGG16 pre-trained and Pascal VOC dataset. When using smooth L1 looks fine, but if I try to switch to diou/ciou, the loss shall grow rapidly and soon the training terminates.

Here is my code def ciou_loss(self, y_true, y_pred): """ Implements the GIoU loss function. CIoU is an enhancement for models which use IoU in object detection. Args: y_true: true targets tensor. The coordinates of the each bounding box in boxes are encoded as [y_min, x_min, y_max, x_max]. y_pred: predictions tensor. The coordinates of the each bounding box in boxes are encoded as [y_min, x_min, y_max, x_max]. mode: one of ['giou', 'iou'], decided to calculate GIoU or IoU loss. Returns: GIoU loss floatTensor`. """ y_pred = tf.convert_to_tensor(y_pred) if not y_pred.dtype.is_floating: y_pred = tf.cast(y_pred, tf.float32) y_true = tf.cast(y_true, y_pred.dtype) ciou = tf.squeeze(self.calculate_ciou(y_pred, y_true))

    return 1 - ciou

def calculate_ciou(self, b1, b2):

    Arguments:
       target (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
           In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
           contains the ground truth bounding box coordinates, where the last dimension
           contains `(xmin, xmax, ymin, ymax)`.
       output (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
           the predicted data, in this context the predicted bounding box coordinates.
    takes in a list of bounding boxes
    but can work for a single bounding box too
    all the boundary cases such as bounding boxes of size 0 are handled.
    ###

    zero = tf.convert_to_tensor(0.0, b1.dtype)
    x1g, y1g, x2g, y2g = tf.unstack(value=b1, num=4, axis=-1)
    x1, y1, x2, y2 = tf.unstack(value=b2, num=4, axis=-1)

    true_width = tf.maximum(zero, x2g - x1g)
    true_height = tf.maximum(zero, y2g - y1g)
    pred_width = tf.maximum(zero, x2 - x1)
    pred_height = tf.maximum(zero, y2 - y1)
    true_area = true_width * true_height
    pred_area = pred_width * pred_height

    ###iou term###
    intersect_ymin = tf.maximum(y1g, y1)
    intersect_xmin = tf.maximum(x1g, x1)
    intersect_ymax = tf.minimum(y2g, y2)
    intersect_xmax = tf.minimum(x2g, x2)
    intersect_width = tf.maximum(zero, intersect_xmax - intersect_xmin)
    intersect_height = tf.maximum(zero, intersect_ymax - intersect_ymin)
    intersect_area = intersect_width * intersect_height

    union_area = true_area + pred_area - intersect_area
    iou = tf.math.divide_no_nan(intersect_area, union_area)

    ###

    ###distance term###
    x_center = (x2 + x1) / 2
    y_center = (y2 + y1) / 2
    x_center_g = (x1g + x2g) / 2
    y_center_g = (y1g + y2g) / 2
    xc1 = tf.minimum(x1, x1g)
    yc1 = tf.minimum(y1, y1g)
    xc2 = tf.maximum(x2, x2g)
    yc2 = tf.maximum(y2, y2g)
    c = tf.pow((xc2 - xc1), 2) + tf.pow((yc2 - yc1), 2)
    d = tf.pow((x_center - x_center_g), 2) + tf.pow((y_center - y_center_g), 2)
    u = tf.math.divide_no_nan(d, c)
    ###

    ###aspect-ratio term###
    arctan = tf.atan(tf.math.divide_no_nan(true_width, true_height)) - tf.atan(tf.math.divide_no_nan(pred_width, pred_height))
    v = (4 / tf.pow(math.pi, 2)) * tf.pow((tf.atan(tf.math.divide_no_nan(true_width, true_height)) -
                                                                  tf.atan(tf.math.divide_no_nan(pred_width, pred_height))), 2)
    s = 1 - iou
    alpha = tf.math.divide_no_nan(v, s + v)
    w_temp = 2 * pred_width
    # ar = (4 / (tf.pow(math.pi, 2))) * tf.pow(arctan, 2)
    ###
    # ar = tf.clip_by_value(alpha * v, -1.0, 1.0)
    ###calculate ciou###
    ciou = iou - (u + alpha * v)
    #ciou = tf.clip_by_value(ciou, -1.0, 1.0) #If I add this line, the loss shall be stable, but result is not good
    return ciou `

I looked at your codes, seems you add the clamp and restrict the value of ciou between -1 and 1. have you encountered the similar things?

yyccR commented 2 years ago

hi, I'm confused about the mention in your paper that "removed w²+h²", where can i find it in the code? it seems that the implement just make "alpha" constant, but the v variable still will be differentiated with w²+h²?