Open gpantaz opened 11 months ago
I have the same problem. : )
Maybe it's just a coords normalization operation in both training and prediction.
However, when using bin2coord
, it causes the coordinates to go out of the image(task.cfg.max_image_size >= task.cfg.patch_image_size
).
def bin2coord(bins, w_resize_ratio, h_resize_ratio):
bin_list = [int(bin[5:-1]) for bin in bins.strip().split()]
coord_list = []
coord_list += [bin_list[0] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
coord_list += [bin_list[1] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
coord_list += [bin_list[2] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / w_resize_ratio]
coord_list += [bin_list[3] / (task.cfg.num_bins - 1) * task.cfg.max_image_size / h_resize_ratio]
return coord_list
Hello!
I would like to ask a question regarding the image quantization. I dont really understand why you divide coordinates of the bounding box with the
max_image_size
(= 512), instead of thepatch_image_size
https://github.com/OFA-Sys/OFA/blob/a36b91ce86ff105ac8d9e513aa88f42b85e33479/utils/transforms.py#L240-L243Assuming a bounding box [x1, y1, x2 x2] with width w and height h, to me it seems that the quantization of each coord would be
x1 / w * (num_bins -1)
. For example for a bounding box [120, 200, 150, 220] with w = 600 and h = 800 the quantized x1 would be:120 / 600 * (num_bins -1)
.Could you also explain the choice behind the value of the
max_image_size
?Thanks :)