Closed tibetgao closed 6 months ago
Thank you for your interest in our project.
All bounding boxes are normalized to integer values within the range [0, 1000). The code shown below demonstrates this process:
height = image.height
width = image.width
bbox = [x1, y1, x2, y2]
BOX_SCALE = 999
if SQUARE_PAD:
if height == width:
pass
elif height < width:
delta = (width - height) // 2
bbox[1] += delta
bbox[3] += delta
else:
delta = (height - width) // 2
bbox[0] += delta
bbox[2] += delta
bbox = [
int(bbox[0] / max(height, width) * BOX_SCALE),
int(bbox[1] / max(height, width) * BOX_SCALE),
int(bbox[2] / max(height, width) * BOX_SCALE),
int(bbox[3] / max(height, width) * BOX_SCALE),
]
else:
bbox = [
int(bbox[0] / width * BOX_SCALE),
int(bbox[1] / height * BOX_SCALE),
int(bbox[2] / width * BOX_SCALE),
int(bbox[3] / height * BOX_SCALE),
]
Note that when SQUARE_PAD
is set to True
, the image will be padded to form a square.
You can refer to this script for more details about how to visualize these boxes.
Hi there,
As I have observed from the annotations, I found that some values in bbox coordinates might exceed the limitation of the image size (usually 640*480),e.g.: '\nWhat are the two people[[200, 251, 447, 963], [529, 246, 744, 984]] doing in the image?\nAnswer the question with scene graph.'
I am wondering if there is any extra operation that needs to be done (e.g. normalization)
Cheers!