SamsungLabs / iterdet

[S+SSPR2020] IterDet: Iterative Scheme for Object Detection in Crowded Environments
https://arxiv.org/abs/2005.05708
Mozilla Public License 2.0
210 stars 39 forks source link

The question about scale_factor. #24

Closed Sunnyheye closed 4 years ago

Sunnyheye commented 4 years ago

Hello, Thank you very much for your excellent work. I have a question when running your code. I don't quite understand the relationship between ori_shape, img_shape and pad_shape. For example, the ori_shape is (600,800), the img_shape = (800,1067) and the pad_shape = (800,1088), and scale_factor is the array([1.33375 , 1.3333334, 1.33375 , 1.3333334]), which is [800/600, 1067/800, 800/600, 1067/800],that is to say, scale_factor is img_shape/ori_shape. During the test process, we get the coordinates of the detection boxes based on the original image, and we need to map the detected boxes to the history map. The shape of history map is pad_shape=(800,1088), however the scale factor is about original shape and resized shape. Why use scale_factor [0] in this part of the code? https://github.com/saic-vul/iterdet/blob/master/mmdet/models/detectors/iterdet_faster_rcnn.py#L91

I am looking forward to your reply. Thank you.

filaPro commented 4 years ago

Hi @Sunnyheye,

I think pad_shape doesn't matter here. mmcv pads an image at bottom and right sides. This transform keep the coordinates of a box multiplied by scale_factor[0] in place. Feel free to correct me if i'm wrong.

Sunnyheye commented 4 years ago

Hi @Sunnyheye,

I think pad_shape doesn't matter here. mmcv pads an image at bottom and right sides. This transform keep the coordinates of a box multiplied by scale_factor[0] in place. Feel free to correct me if i'm wrong.

I'd like to ask, the image that we send into the backbone network is the image after padding, right? Is the code of scale_factor correct? scale_factor = np.array([w_scale, h_scale, w_scale, h_scale]), which is about original image and resized image. https://github.com/saic-vul/iterdet/blob/master/mmdet/datasets/pipelines/transforms.py#L137

filaPro commented 4 years ago

I'd like to ask, the image that we send into the backbone network is the image after padding, right?

yes

Is the code of scale_factor correct?

I think, yes. The image is rescaled with this factor and then padded to the size, that divides 32. However this right and/or bottom padding has no effect on the scale factor between predicted boxes and boxes in history.

Sunnyheye commented 4 years ago

I think I get it. Thank you so much for your prompt reply! It helps me a lot! :.゚ヽ(。◕‿◕。)ノ゚.:。+゚