viscot_363k.json - Githubissues

lyc728 commented 7 months ago

你好，json中的from": "gpt","value": "[0.133, 0.532, 0.187, 0.553]"这个值是怎么得到的

deepcs233 commented 7 months ago

Hi! 请参考我们的论文：https://arxiv.org/abs/2403.16999 中的3.1节

lyc728 commented 7 months ago

看了下文章，没有确切的回复，麻烦解答下，看坐标并不是简单的进行归一化，像是对值进行一定缩放

deepcs233 commented 7 months ago

你好，我们先通过一些方法得到基于原始图片像素值的bounding box。为了方便后续的训练，我们先将原始图片补全至正方形，同时将bounding box也做相同的映射。最后再将bounding box做归一化，即除以图片的边长。可以参考下面的代码

def get_bbox_str(bboxs, width, height):
    if len(bboxs) > 1:
        large_bbox = []
        large_bbox.append(min([x[0] for x in bboxs]))
        large_bbox.append(min([x[1] for x in bboxs]))
        large_bbox.append(max([x[2] for x in bboxs]))
        large_bbox.append(max([x[3] for x in bboxs]))
        bbox = large_bbox
    else:
        bbox = bboxs[0]
    if width > height:
        bbox[1] += (width - height) // 2
        bbox[3] += (width - height) // 2
        bbox = [x/width for x in bbox]
    else:
        bbox[0] += (height - width) // 2
        bbox[2] += (height - width) // 2
        bbox = [x/height for x in bbox]
    return '[%0.3f, %0.3f, %0.3f, %0.3f]' % (bbox[0], bbox[1], bbox[2], bbox[3])

LengSicong commented 7 months ago

what is the value after the image path? e.g., [198, 114, 240, 146]

Is it the bbox before the padding and normalization?

deepcs233 commented 7 months ago

Hi! @LengSicong It's the original bbox before preprocessing.

LengSicong commented 7 months ago

Noted with thanks!

deepcs233 / Visual-CoT

viscot_363k.json #1