THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.88k stars 975 forks source link

Issue with Image Resizing during Inference #144

Closed HassanBinHaroon closed 5 months ago

HassanBinHaroon commented 5 months ago

When I input 'yolo predict imgsz=[480,640]' for inference with an input image larger than the specified dimensions, I intend to resize the image to those dimensions before feeding it to the model for the forward pass. However, it appears that the model did not infer on the resized image. It seems that the original image have been passed to the network instead.

Could you please elaborate on this issue? Thank you!

jameslahm commented 5 months ago

Thanks for your interest! Could you please provide more details of how to reproduce this issue? For example, what's the original size of the input image?

HassanBinHaroon commented 5 months ago

To reproduce the issue:

You have a model trained on image size a.k.a. imgsz = [480, 640] and now you run inference on an image whose dimensions are [1080, 1920]. So, if the model is trained on [480, 640], it should perform the inference on [480, 640] by resizing every input image to [480, 640]. But it not obtained so far.

Could you please elaborate on this? Thank you!

leonnil commented 5 months ago

Hi @HassanBinHaroon,

I'm afraid I was unable to reproduce your problem. In my case, my input would be resized rather than kept in its original shape during inference, but the resized shape will not be exactly 480x640.

In YOLO detection, we use "letterbox" resizing to adjust the input image size to a specific resolution while preserving the aspect ratio. Since 1080x1920 and 480x640 have different aspect ratios, the input will not be exactly resized to 480x640. To achieve the desired effect, you can set args auto=False and scaleFill=True in the code below.

https://github.com/THU-MIG/yolov10/blob/060af8b8c1a49deb364c716504a119db15ac56b4/ultralytics/engine/predictor.py#L155

Let us know if you have any further questions.

HassanBinHaroon commented 5 months ago

@leonnil Thanks for the response.

I'll shortly provide the confirmation by using auto=False and scaleFill=True.

Meanwhile, can you please elaborate functionality of these two arguments "auto=False and scaleFill=True"?

leonnil commented 5 months ago

auto: Ensures the image size is compatible with the network's stride requirements by adjusting the padding to be a multiple of the stride. scalefill: Forces the image to be resized to the exact target size without maintaining the aspect ratio.

Note: Using scalefill can distort the objects within the image, potentially reducing the accuracy of predictions in some situations.

jameslahm commented 5 months ago

Please feel free to reopen this issue if you have further questions.