ChenRocks / UNITER

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
https://arxiv.org/abs/1909.11740
777 stars 109 forks source link

What format is used when extracting features, RGB or BGR? #77

Closed VisualJoyce closed 3 years ago

VisualJoyce commented 3 years ago

In the feature extraction code, I notice that although Image is used to get img, but im read from cv2 is used later. It's confusing that img = Image.open(im_file).convert('RGB') is used, but no cv2.cvtColor(im_cv, cv2.COLOR_BGR2RGB).

    try:
        img = Image.open(im_file).convert('RGB')
    except Exception as e:
        print(e)
        print("Corrupted image failed with Image.open: %s" % (im_file_name))
        return corrupted_im_return

    try:
        im = cv2.imread(im_file)
        if im is None:
            print("Corrupted image failed with cv2: %s" % (im_file))
            return corrupted_im_return
    except Exception as e:
        print(e)
        print("Corrupted image failed with cv2: %s" % (im_file))
        return corrupted_im_return

    try:
        print("Processing image_file: %s." % (im_file_name))
        scores, boxes, attr_scores, rel_scores = im_detect(net, im)
        # Keep the original boxes
        # don't worry about the regresssion bbox outputs
        rois = net.blobs['rois'].data.copy()
        # unscale back to raw image space
        blobs, im_scales = _get_blobs(im, None)

        cls_boxes = rois[:, 1:5] / im_scales[0]

        cls_prob = net.blobs['cls_prob'].data
        pool5 = net.blobs['pool5_flat'].data
    except:
        print("Got exception when processing image_file: %s." % (im_file))
        return corrupted_im_return

So, is RGB or BGR that we need to use for this step?

VisualJoyce commented 3 years ago

Closing this because I checked the API im_detect, im should be BGR order:

def im_detect(net, im, boxes):
    """Detect object classes in an image given object proposals.
    Arguments:
        net (caffe.Net): Fast R-CNN network to use
        im (ndarray): color image to test (in BGR order)
        boxes (ndarray): R x 4 array of object proposals
    Returns:
        scores (ndarray): R x K array of object class scores (K includes
            background as object category 0)
        boxes (ndarray): R x (4*K) array of predicted bounding boxes
    """