OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.39k stars 248 forks source link

Question regarding the role of max_image_size in image quantization #420

Open gpantaz opened 11 months ago

gpantaz commented 11 months ago

Hello!

I would like to ask a question regarding the image quantization. I dont really understand why you divide coordinates of the bounding box with the max_image_size (= 512), instead of the patch_image_size https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/utils/transforms.py#L240-L243

Assuming a bounding box [x1, y1, x2 x2] with width w and height h, to me it seems that the quantization of each coord would be x1 / w * (num_bins -1). For example for a bounding box [120, 200, 150, 220] with w = 600 and h = 800 the quantized x1 would be: 120 / 600 * (num_bins -1).

Could you also explain the choice behind the value of the max_image_size?

Thanks :)

JJJYmmm commented 5 months ago

I have the same problem. : )

JJJYmmm commented 5 months ago

Maybe it's just a coords normalization operation in both training and prediction. However, when using bin2coord, it causes the coordinates to go out of the image(task.cfg.max_image_size >= task.cfg.patch_image_size).

def bin2coord(bins, w_resize_ratio, h_resize_ratio):
    bin_list = [int(bin[5:-1]) for bin in bins.strip().split()]
    coord_list = []
    coord_list += [bin_list[0] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
    coord_list += [bin_list[1] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
    coord_list += [bin_list[2] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
    coord_list += [bin_list[3] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
    return coord_list