THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.94k stars 988 forks source link

Problems with detecting smaller objects or objects in the distance #100

Open SkalskiP opened 5 months ago

SkalskiP commented 5 months ago

Hi 👋🏻

I noticed that YOLOv10 has trouble detecting small objects, especially compared to YOLOv8 and YOLOv9. I have built a small HF Space where you can test this. Is this a known issue? Czy mogę coś zrobić by poprawić ten performance w relacji do pozostałych modeli.

Here is the comparison of YOLOv8l at 640x640 and YOLOv10l at 640x640:

https://github.com/THU-MIG/yolov10/assets/26109316/94ad1c43-80dd-402e-a8cf-de51aea63560

jameslahm commented 5 months ago

Thanks for the fantastic demo and detailed evaluation! We previously posted a comment on your demo page to provide some clarification. https://huggingface.co/spaces/SkalskiP/YOLO-ARENA/discussions/1. Thanks!

pseacrest commented 5 months ago

Same issue in my model, can we freely set a smaller threshold for YOLOv10 to detect more small object?

SkalskiP commented 5 months ago

Hi @jameslahm 👋🏻 If I understand your comment correctly, the differences are due to:

jameslahm commented 5 months ago
  • Loss of accuracy resulting from conversion to ONNX (is this expected)?

@SkalskiP Thanks. We try to simulate the loss of accuracy in our local environment. We infer the image using the same onnx conversion following the below process. But the result is still different from that of the demo. So we are not very sure where the reason lies and want to seek your help.

wget https://skalskip-yolo-arena.hf.space/file=/tmp/gradio/f878616a10625ce7dba02bcb34df2df279273666/image.png
yolo export model=yolov10m.pt format=onnx opset=13 simplify half=True device=0
yolo predict model=yolov10m.onnx source=vehicles.png half conf=0.4
  • Different optimal confidence thresholds?

Yes, we think so.

jameslahm commented 5 months ago

Same issue in my model, can we freely set a smaller threshold for YOLOv10 to detect more small object?

@pseacrest Yes, we think so.

SkalskiP commented 5 months ago

The ONNX models that we are running were converted by our ML team. I'll try to understand how they did it and get back to you.

NickHerrig commented 5 months ago

@SkalskiP ONNX Models were converted using the instructions in the README.md..

The steps below were followed for n/s/m/b/l/x pt files:

  1. The model weights were downloaded for example from yolov10n.pt
  2. Model weights were exported to ONNX via yolo export model=yolov10n.pt format=onnx opset=13 simplify
jameslahm commented 5 months ago

@NickHerrig Thanks! Would you mind checking if the following results in your local environment match that of the demo? Thank you!

  • Loss of accuracy resulting from conversion to ONNX (is this expected)?

@SkalskiP Thanks. We try to simulate the loss of accuracy in our local environment. We infer the image using the same onnx conversion following the below process. But the result is still different from that of the demo. So we are not very sure where the reason lies and want to seek your help.

wget https://skalskip-yolo-arena.hf.space/file=/tmp/gradio/f878616a10625ce7dba02bcb34df2df279273666/image.png
yolo export model=yolov10m.pt format=onnx opset=13 simplify half=True device=0
yolo predict model=yolov10m.onnx source=vehicles.png half conf=0.4
SkalskiP commented 5 months ago

Hi @jameslahm 👋🏻

I just updated https://huggingface.co/spaces/SkalskiP/YOLO-ARENA. Now we load images in Pillow. And the results are slightly different.

SkalskiP commented 5 months ago

I also added per-model confidence threshold sliders.

Screenshot 2024-05-29 at 09 01 07

jameslahm commented 5 months ago

@SkalskiP Thank you very much! The results of the demo seem to be still different from our local environment. We are investigating this. We will get back to you once we identify the root cause.

salwaghanim commented 5 months ago

@SkalskiP Hello, you have designed a Simple and elegant interface is it opensource? btw I Just checked your github page and its very impressive I loved the Neural networks Numpy example

jameslahm commented 5 months ago

@SkalskiP @NickHerrig We found that the inference results seem to be not the same in our codebase and Roboflow Inference with the same onnx file. Here is a minimal example for reproducing this issue.

wget https://skalskip-yolo-arena.hf.space/file=/tmp/gradio/56eee51b0a661453cbf915229dfbadc00b7a0cad/vehicles.png
pip install -q git+https://github.com/THU-MIG/yolov10.git
import numpy as np
import supervision as sv
from inference import get_model
from PIL import Image

def detect_and_annotate(
    input_image: np.ndarray,
    confidence_threshold: float,
    iou_threshold: float = 0,
):
    model = get_model(model_id="coco/22")
    result = model.infer(
        input_image,
        confidence=confidence_threshold,
        iou_threshold=iou_threshold
    )[0]
    detections = sv.Detections.from_inference(result)

    print(detections.data['class_name'])

detect_and_annotate(Image.open('vehicles.png'), 0.4)

from ultralytics import YOLOv10

model = YOLOv10('/tmp/cache/coco/22/weights.onnx', task='detect')
model.predict(source=Image.open('vehicles.png'), verbose=True, conf=0.4)

The output is:

# Roboflow inference
['truck' 'car' 'car']

# This codebase
Loading /tmp/cache/coco/22/weights.onnx for ONNX Runtime inference...

0: 640x640 3 cars, 1 truck, 16.7ms
Speed: 11.3ms preprocess, 16.7ms inference, 15.7ms postprocess per image at shape (1, 3, 640, 640)

We observe that one truck and two cars are detected with Roboflow inference, while one truck and three cars are detected in our codebase. May we ask for your help? Thanks a lot!

SkalskiP commented 5 months ago

@salwaghanim, thanks a lot! The UI is built with gradio.

SkalskiP commented 5 months ago

@jameslahm I'll let @NickHerrig try to investigate that.

NickHerrig commented 5 months ago

@SkalskiP and @jameslahm It appears that the different prediction confidence scores are the result of different preprocessing steps (resizing) in inference and yolo cli. Was able to run a test on yolo cli and roboflow/inference with the images already resized to 640px and am seeing the same predictions and confidence scores.

Take a look at the below image where on the right we see inference results and on the left we see yolo cli results:

image

jameslahm commented 5 months ago

@NickHerrig Thanks a lot for your great efforts! Is the different preprocessing step between inference and yolo cli expected?

jameslahm commented 5 months ago

@SkalskiP It seems that we and @NickHerrig have identified the root cause. One of the reasons is that roboflow inference invokes the NMS in the postprocessing of YOLOv10, which is not needed as it does not rely on NMS. Besides, the exported onnx files may be corrupted, and replacing our exported onnx models leads to the same results as our local environment. We have submitted a PR https://github.com/roboflow/inference/pull/437 to fix these. Thank you!

jameslahm commented 5 months ago

@SkalskiP The PR https://github.com/roboflow/inference/pull/437 has been merged. The results of roboflow Inference and our local environment are the same now. Would you mind updating the inference version in the requirements.txt of the HF Space? Thanks a lot!

jameslahm commented 5 months ago

@SkalskiP Friendly ping :) Thanks!

jameslahm commented 4 months ago

@SkalskiP We opened a PR in https://huggingface.co/spaces/SkalskiP/YOLO-ARENA/discussions/2 to update the inference version of the HF Space? Would you mind taking a look? Thanks a lot!