YOLOv10 is slower than YOLOv8

nerbivol commented 3 months ago

I have been running some tests comparing YOLOv10 and YOLOv8 on the same hardware and image. Surprisingly, YOLOv10 is consistently slower than YOLOv8 for inference. Below are the details of my tests:

Environment:

Hardware: NVIDIA T4 GPU on Google Colab

Test Script:

from ultralytics import YOLOv10, YOLO

# YOLOv10
model_v10 = YOLOv10('yolov10n.pt')
res_v10 = model_v10.predict('bus.jpg')
print(res_v10)

# YOLOv8
model_v8 = YOLO('yolov8n.pt')
res_v8 = model_v8.predict('bus.jpg')
print(res_v8)

Test Results:

YOLOv10:

image 1/1 /content/bus.jpg: 640x480 4 persons, 1 bus, 11.0ms
Speed: 2.3ms preprocess, 11.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 480)

YOLOv8:

image 1/1 /content/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 6.9ms
Speed: 2.3ms preprocess, 6.9ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 480)

Test Script:

from ultralytics import YOLOv10, YOLO

# YOLOv10
model_v10 = YOLOv10('yolov10s.pt')
res_v10 = model_v10.predict('bus.jpg')
print(res_v10)

# YOLOv8
model_v8 = YOLO('yolov8s.pt')
res_v8 = model_v8.predict('bus.jpg')
print(res_v8)

YOLOv10:

image 1/1 /content/bus.jpg: 640x480 4 persons, 1 bus, 13.9ms
Speed: 2.6ms preprocess, 13.9ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 480)

YOLOv8:

image 1/1 /content/bus.jpg: 640x480 4 persons, 1 bus, 12.8ms
Speed: 2.5ms preprocess, 12.8ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 480)

Issue: As seen from the results above, YOLOv10 takes significantly more time for inference compared to YOLOv8. This is unexpected as YOLOv10 is supposed to be an improved version. The test image and hardware are identical in both cases.

Could you please help investigate this issue? Let me know if you need any more details or if there are any additional tests you would like me to run.

jameslahm commented 3 months ago

Thanks for your interest! Could you please use the exported format for benchmark? Please refer to https://github.com/THU-MIG/yolov10?tab=readme-ov-file#notes for more details.

nerbivol commented 3 months ago

Test Script for YOLOv10n

!yolo export model=jameslahm/yolov10n format=onnx opset=13 simplify
!yolo predict task=detect model=yolov10n.onnx source=bus.jpg

Output

Ultralytics YOLOv8.1.34 🚀 Python-3.10.12 torch-2.3.0+cu121 CUDA:0 (Tesla T4, 15102MiB)
Loading yolov10n.onnx for ONNX Runtime inference...
2024-06-18 11:58:10.820066057 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

2024-06-18 11:58:10.820093622 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:870 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.

image 1/1 /content/bus.jpg: 640x640 4 persons, 1 bus, 154.7ms
Speed: 38.8ms preprocess, 154.7ms inference, 58.3ms postprocess per image at shape (1, 3, 640, 640)

Test Script for YOLOv8n

!yolo export model=yolov8n.pt format=onnx opset=13 simplify
!yolo predict task=detect model=yolov8n.onnx source=bus.jpg

Output

Ultralytics YOLOv8.1.34 🚀 Python-3.10.12 torch-2.3.0+cu121 CUDA:0 (Tesla T4, 15102MiB)
Loading yolov8n.onnx for ONNX Runtime inference...
2024-06-18 11:57:36.468053855 [E:onnxruntime:Default, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

2024-06-18 11:57:36.468083950 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:870 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.

image 1/1 /content/bus.jpg: 640x640 4 persons, 1 bus, 184.4ms
Speed: 36.8ms preprocess, 184.4ms inference, 595.1ms postprocess per image at shape (1, 3, 640, 640)

gurkirt commented 3 months ago

@nerbivol You need to test on multiple images in loop, first image for any model is bit random because some model take more time to initiate memory on device so first interference is usually not accurately timed. If you test on 100 images that should reflect the true time.

nerbivol commented 3 months ago

Of course, I understand this. Even the above example shows that postprocessing is much faster on YOLOv10

SutirthaChakraborty commented 3 months ago

import time
from ultralytics import YOLO

# Load pretrained models
model_v10n = YOLO("/content/yolov10n.pt")
model_v8n = YOLO("yolov8n.pt")

# Define the number of repetitions
num_repetitions = 100
image_path = "/content/Image_6075.jpg"

# Measure the processing time for YOLOv10n
total_time_v10n = 0
for _ in range(num_repetitions):
    start_time = time.time()
    results_v10n = model_v10n(image_path,verbose=False)
    end_time = time.time()
    print("yolo10: ",(end_time - start_time))
    total_time_v10n += end_time - start_time

average_time_v10n = total_time_v10n / num_repetitions

# Measure the processing time for YOLOv8n
total_time_v8n = 0
for _ in range(num_repetitions):
    start_time = time.time()
    results_v8n = model_v8n(image_path,verbose=False)
    end_time = time.time()
    print("yolo8: ",(end_time - start_time))
    total_time_v8n += end_time - start_time

average_time_v8n = total_time_v8n / num_repetitions

# Print the average processing times
print(f"Average processing time for YOLOv10n: {average_time_v10n} seconds")
print(f"Average processing time for YOLOv8n: {average_time_v8n} seconds")

# Compare the average processing times
if average_time_v10n < average_time_v8n:
    print("YOLOv10n is faster.")
else:
    print("YOLOv8n is faster.")

I am running on colab CPU. What am I doing wrong? Output: Average processing time for YOLOv10n: 0.2772008299827576 seconds Average processing time for YOLOv8n: 0.2413509488105774 seconds YOLOv8n is faster.

nerbivol commented 3 months ago

With onnx models, the results are as follows on T4 GPU Colab:

model_v10n = YOLOv10("/content/yolov10n.onnx")
model_v8n = YOLO("/content/yolov8n.onnx")

Output:

Loading /content/yolov10n.onnx for ONNX Runtime inference...
Loading /content/yolov8n.onnx for ONNX Runtime inference...
Average processing time for YOLOv10n: 0.15218737125396728 seconds
Average processing time for YOLOv8n: 0.15860552310943604 seconds
YOLOv10n is faster.

With larger models, the difference is more noticeable. Here are the results with s models:

Loading /content/yolov10s.onnx for ONNX Runtime inference...
Loading /content/yolov8s.onnx for ONNX Runtime inference...
Average processing time for YOLOv10n: 0.32656208038330076 seconds
Average processing time for YOLOv8n: 0.3708038759231567 seconds
YOLOv10n is faster.

glutinouscloud commented 2 months ago

tks

THU-MIG / yolov10

YOLOv10 is slower than YOLOv8 #277