Visual-Behavior / detr-tensorflow

Tensorflow implementation of DETR : Object Detection with Transformers
MIT License
168 stars 53 forks source link

What is the "mask" from the input image? #49

Open Zhong-Zi-Zeng opened 1 year ago

Zhong-Zi-Zeng commented 1 year ago

I am very confused about the "mask" in the detr.py. Could you explain what is this, please? And, if Intput's image was resized to the same size then we don't need "mask" right?

def downsample_masks(self, masks, x):
        masks = tf.cast(masks, tf.int32)
        masks = tf.expand_dims(masks, -1)
        masks = tf.compat.v1.image.resize_nearest_neighbor(masks, tf.shape(x)[1:3], align_corners=False, half_pixel_centers=False)
        masks = tf.squeeze(masks, -1)
        masks = tf.cast(masks, tf.bool)
        return masks

def call(self, inp, training=False, post_process=False):
    x, masks = inp
    x = self.backbone(x, training=training)