IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
https://detrex.readthedocs.io/en/latest/
Apache License 2.0
1.95k stars 204 forks source link

Questionable Width and Height Recalculation for Maskdino #247

Open FabianSchuetze opened 1 year ago

FabianSchuetze commented 1 year ago

Thanks for this wonderful project. It's a pleasure to work with it.

However, I think there might be a bug when resizing the bounding box in maskdino. The width and height is calculated twice. At first it is read from the dict "input_per_image" like so in lines 190-191:

height = input_per_image.get("height", image_size[0])  # real size
width = input_per_image.get("width", image_size[1])

and then it is recalculated again in lines 220-221:

height = new_size[0]/image_size[0]*height
width = new_size[1]/image_size[1]*width

In my case, the final width exceeds the width of the original image.

This might also be related to #242 where we saw shifted bounding boxes.

rentainhe commented 1 year ago

We will double check this bug tomorrow~ thanks a lot for you PR @FabianSchuetze

FabianSchuetze commented 1 year ago

Thanks a lot, @rentainhe , and @HaoZhang534 for your kind and quick feedback. It's really a great pleasure to work with detrex and interact with the team behind it.

rentainhe commented 1 year ago

Thanks a lot, @rentainhe , and @HaoZhang534 for your kind and quick feedback. It's really a great pleasure to work with detrex and interact with the team behind it.

Of course! We also hope to communicate more with the users!

HaoZhang534 commented 1 year ago

Hello @FabianSchuetze, we review the code and find lines 190-191 are problematic because input_per_image['height'] is not equal to image_size[0]. input_per_image['height'] is the height of the original image before augmentation and image_size[0] is the size after augmentation before the divisibility padding. Therefore, our method should require the user to include 'height' and 'width' as the original image size in the input dict. We will fix the bug soon. I think the problem in https://github.com/IDEA-Research/detrex/issues/242 is due to the lack of 'height' and 'width' leading to lines 190-191 taking image_size as the original size. Moreover, lines 220-221 cannot deleted. They are used to remove the paddings to make sure our evaluator gets the box sizes corresponding to the original image size.

FabianSchuetze commented 1 year ago

Thanks for your comments and for the work @HaoZhang534 !

I am not sure the problem is solved. The functions

height = new_size[0]/image_size[0]*height
width = new_size[1]/image_size[1]*width

still return (960, 1305) and not (960, 1280) in my case. The bounding boxes are still shifted as in #242 . When I remove these two lines, the results are OK.