dengdan / seglink

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments
GNU General Public License v3.0
495 stars 178 forks source link

About bboxes_filter_overlap #21

Closed abc8350712 closed 7 years ago

abc8350712 commented 7 years ago

I read the code and something make me confused. In the process of data augmentation,the following function appears.

bboxes_filter_overlap(labels, bboxes,xs, ys, threshold, scope=None, assign_negative = False)

Is the value in bboxes of parameters may be negative?

I am looking forward to your help!

dengdan commented 7 years ago

The label value of a bbox might be negative after preprocessing, meaning do-not-care. Such bboxes are ignored and won't contribute the loss when training.

abc8350712 commented 7 years ago

The function of bboxes_filter_overlap is

with tf.name_scope(scope, 'bboxes_filter', [labels, bboxes]):
    scores = bboxes_intersection(tf.constant([0, 0, 1, 1], bboxes.dtype),bboxes)

    mask = scores > threshold
    if assign_negative:
        labels = tf.where(mask, labels, -labels)
    else:
        labels = tf.boolean_mask(labels, mask)
        bboxes = tf.boolean_mask(bboxes, mask)
        scores = bboxes_intersection(tf.constant([0, 0, 1, 1], bboxes.dtype),bboxes)
        xs = tf.boolean_mask(xs, mask);
        ys = tf.boolean_mask(ys, mask);
    return labels, bboxes, xs, ys

The scores is used to filter out the bboxes. But i see the code of bboxes_intersection

with tf.name_scope(name, 'bboxes_intersection'):
    # Should be more efficient to first transpose.
    bboxes = tf.transpose(bboxes)
    bbox_ref = tf.transpose(bbox_ref)
    # Intersection bbox and volume.
    int_ymin = tf.maximum(bboxes[0], bbox_ref[0])
    int_xmin = tf.maximum(bboxes[1], bbox_ref[1])
    int_ymax = tf.minimum(bboxes[2], bbox_ref[2])
    int_xmax = tf.minimum(bboxes[3], bbox_ref[3])
    h = tf.maximum(int_ymax - int_ymin, 0.)
    w = tf.maximum(int_xmax - int_xmin, 0.)
    # Volumes.
    inter_vol = h * w
    bboxes_vol = (bboxes[2] - bboxes[0]) * (bboxes[3] - bboxes[1])
    scores = tfe_math.safe_divide(inter_vol, bboxes_vol, 'intersection')
    return scores

if the values in bboxes are postive, the scores should be 1. some of bboxes with negative are filtered out, but some are not if the score is still larger than threshold.

But if the bboxes with negative mean do-not-care, what is the aim to filter out some of bboxes?

dengdan commented 7 years ago

In preprocessing and data augmentation, the original images are randomly cropped, and the coordinate values of a bounding box might be negative, and then the scores in method bboxes_intersection may be less than 1.0.