OpenGVLab / all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
https://huggingface.co/spaces/OpenGVLab/all-seeing
448 stars 14 forks source link

Issue on bounding box coordinates #10

Closed tibetgao closed 6 months ago

tibetgao commented 6 months ago

Hi there,

As I have observed from the annotations, I found that some values in bbox coordinates might exceed the limitation of the image size (usually 640*480),e.g.: '\nWhat are the two people[[200, 251, 447, 963], [529, 246, 744, 984]] doing in the image?\nAnswer the question with scene graph.'

I am wondering if there is any extra operation that needs to be done (e.g. normalization)

Cheers!

Weiyun1025 commented 6 months ago

Thank you for your interest in our project.

All bounding boxes are normalized to integer values within the range [0, 1000). The code shown below demonstrates this process:

height = image.height
width = image.width
bbox = [x1, y1, x2, y2]
BOX_SCALE = 999

if SQUARE_PAD:
    if height == width:
        pass
    elif height < width:
        delta = (width - height) // 2
        bbox[1] += delta
        bbox[3] += delta
    else:
        delta = (height - width) // 2
        bbox[0] += delta
        bbox[2] += delta

    bbox = [
        int(bbox[0] / max(height, width) * BOX_SCALE),
        int(bbox[1] / max(height, width) * BOX_SCALE),
        int(bbox[2] / max(height, width) * BOX_SCALE),
        int(bbox[3] / max(height, width) * BOX_SCALE),
    ]
else:
    bbox = [
        int(bbox[0] / width * BOX_SCALE),
        int(bbox[1] / height * BOX_SCALE),
        int(bbox[2] / width * BOX_SCALE),
        int(bbox[3] / height * BOX_SCALE),
    ]

Note that when SQUARE_PAD is set to True, the image will be padded to form a square.

You can refer to this script for more details about how to visualize these boxes.