Open lyc728 opened 7 months ago
Hi! 请参考我们的论文:https://arxiv.org/abs/2403.16999 中的3.1节
看了下文章,没有确切的回复,麻烦解答下,看坐标并不是简单的进行归一化,像是对值进行一定缩放
你好, 我们先通过一些方法得到基于原始图片像素值的bounding box。为了方便后续的训练,我们先将原始图片补全至正方形,同时将bounding box也做相同的映射。最后再将bounding box做归一化,即除以图片的边长。可以参考下面的代码
def get_bbox_str(bboxs, width, height):
if len(bboxs) > 1:
large_bbox = []
large_bbox.append(min([x[0] for x in bboxs]))
large_bbox.append(min([x[1] for x in bboxs]))
large_bbox.append(max([x[2] for x in bboxs]))
large_bbox.append(max([x[3] for x in bboxs]))
bbox = large_bbox
else:
bbox = bboxs[0]
if width > height:
bbox[1] += (width - height) // 2
bbox[3] += (width - height) // 2
bbox = [x/width for x in bbox]
else:
bbox[0] += (height - width) // 2
bbox[2] += (height - width) // 2
bbox = [x/height for x in bbox]
return '[%0.3f, %0.3f, %0.3f, %0.3f]' % (bbox[0], bbox[1], bbox[2], bbox[3])
what is the value after the image path? e.g., [198, 114, 240, 146]
Is it the bbox before the padding and normalization?
Hi! @LengSicong It's the original bbox before preprocessing.
Noted with thanks!
你好,json中的from": "gpt","value": "[0.133, 0.532, 0.187, 0.553]"这个值是怎么得到的