OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
https://internvl.github.io/
MIT License
3.8k stars 291 forks source link

最优分辨率选择问题 #275

Open hgdhrt opened 2 weeks ago

hgdhrt commented 2 weeks ago

请问一下,关于一张图片的最优选择率问题,如果一张图片长宽比是2:4,那么1:2,3:6都是符合原始分辨率的长宽比的,是出于什么样的考虑选择area > 0.5 image_size image_size ratio[0] ratio[1]这样的判断条件,为什么会选择0.5这个系数

def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
    best_ratio_diff = float('inf')
    best_ratio = (1, 1)
    area = width * height
    for ratio in target_ratios:
        target_aspect_ratio = ratio[0] / ratio[1]
        ratio_diff = abs(aspect_ratio - target_aspect_ratio)
        if ratio_diff < best_ratio_diff:
            best_ratio_diff = ratio_diff
            best_ratio = ratio
        elif ratio_diff == best_ratio_diff:
            if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
                best_ratio = ratio
    return best_ratio
czczup commented 1 week ago

这个是凭感觉设置的,让一个图片不要放得太大,超出真实分辨率太多。这里没有很仔细的去对比分辨率选择策略,应该还有改进的空间。