Questions Regarding TRT inference speed as compared to Yolov8.

alaap001 commented 3 months ago

Hey, thanks for the amazing repo. I have been testing it against Yolov8 in terms of inference speed which is highlighted int he paper.

I ran a speed test on few videos on a 3070Ti device. Here's my script to run inference.

from ultralytics import YOLO
from ultralytics import YOLOv10

model = YOLOv10('yolov10l.engine', task='detect')

in_dir = "videoplayback1080.mp4"
results = model.predict(source=in_dir, device="0", imgsz=640, half= True, iou = 0.7, save=False, conf=0.3, save_txt=False,
 stream=True, save_conf=True, save_crop=False, show_labels=True, line_width=1)

this is the o/p of Yolov10.engine We are getting an avg of 5.0ms

video 1/1 (frame 1712/9184) videoplayback1080.mp4: 640x640 1 0, 10 2s, 1 5, 2 7s, 4.9ms
video 1/1 (frame 1713/9184) videoplayback1080.mp4: 640x640 1 0, 10 2s, 2 7s, 5.0ms
video 1/1 (frame 1714/9184) videoplayback1080.mp4: 640x640 1 0, 10 2s, 2 7s, 5.0ms
video 1/1 (frame 1715/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1716/9184) videoplayback1080.mp4: 640x640 3 0s, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1717/9184) videoplayback1080.mp4: 640x640 2 0s, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1718/9184) videoplayback1080.mp4: 640x640 4 0s, 8 2s, 2 7s, 5.0ms
video 1/1 (frame 1719/9184) videoplayback1080.mp4: 640x640 3 0s, 10 2s, 2 7s, 5.0ms
video 1/1 (frame 1720/9184) videoplayback1080.mp4: 640x640 3 0s, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1721/9184) videoplayback1080.mp4: 640x640 1 0, 8 2s, 2 7s, 5.0ms
video 1/1 (frame 1722/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1723/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1724/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1725/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1726/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.0ms
video 1/1 (frame 1727/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 3 7s, 5.0ms
video 1/1 (frame 1728/9184) videoplayback1080.mp4: 640x640 1 0, 10 2s, 2 7s, 5.0ms
video 1/1 (frame 1729/9184) videoplayback1080.mp4: 640x640 1 0, 8 2s, 2 7s, 5.0ms
video 1/1 (frame 1730/9184) videoplayback1080.mp4: 640x640 1 0, 9 2s, 2 7s, 5.2ms

and this is the O/p of Yolov8.engine when compared which is giving an avg of 5.1ms

video 1/1 (1712/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1713/9184) videoplayback1080.mp4: 640x640 14 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1714/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1715/9184) videoplayback1080.mp4: 640x640 12 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1716/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1717/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1718/9184) videoplayback1080.mp4: 640x640 12 cars, 1 bus, 1 truck, 5.2ms
video 1/1 (1719/9184) videoplayback1080.mp4: 640x640 12 cars, 1 bus, 1 truck, 5.1ms
video 1/1 (1720/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1721/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 2 trucks, 5.2ms
video 1/1 (1722/9184) videoplayback1080.mp4: 640x640 14 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1723/9184) videoplayback1080.mp4: 640x640 15 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1724/9184) videoplayback1080.mp4: 640x640 14 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1725/9184) videoplayback1080.mp4: 640x640 14 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1726/9184) videoplayback1080.mp4: 640x640 15 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1727/9184) videoplayback1080.mp4: 640x640 14 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1728/9184) videoplayback1080.mp4: 640x640 12 cars, 1 bus, 3 trucks, 5.2ms
video 1/1 (1729/9184) videoplayback1080.mp4: 640x640 13 cars, 1 bus, 2 trucks, 5.1ms
video 1/1 (1730/9184) videoplayback1080.mp4: 640x640 12 cars, 1 bus, 1 truck, 5.1ms

from the looks of it speed hasn't improved much, even tho params and FLOPS are reduced by a lot as compared, the speed itself hasn't changed much.

this is the script I used to convert to .engine, please see if it is correct:

from ultralytics import YOLOv10

model = YOLOv10('yolov10l.pt')

# Export the model
model.export(format='engine',imgsz=640, iou=0.7, device = 0, simplify=True, half = True, workspace=8)

please let me know if this is expected or I'm missing something here.

Also, I tried 1280 imgsz, the speed difference is 61FPS(v8) vs 64FPS(v10)

Is it that this is more optimized for A40, A100 type GPUs as compared to 30 and 40 series?

Thanks.

alaap001 commented 3 months ago

EDIT: I made some changes and got that postprocessing speed, after printing post-processing and pre-processing times we can clearly see a speed boost in post-processing time.

Yolov8l: Speed: 1.2ms preprocess, 6.0ms inference, 0.6ms post process per image at shape (1, 3, 640, 640) Yolov10l: Speed: 1.2ms preprocess, 5.8ms inference, 0.3ms post process per image at shape (1, 3, 640, 640) Yolov10x: Speed: 1.3ms preprocess, 7.1ms inference, 0.3ms post process per image at shape (1, 3, 640, 640)

FPS: Yolov8l: 110.45 FPS at shape (1, 3, 640, 640) Yolov10l: 126.74 FPS at shape (1, 3, 640, 640) Yolov10x: 103.70 FPS at shape (1, 3, 640, 640)

Please let me know if these numbers are near to what is expected. Thanks.

jameslahm commented 3 months ago

Thanks for your interest and detailed evaluation! These numbers seem to be expected.

benx13 commented 3 months ago

can you report speeds for yolov8n,s and yolov10n,s

THU-MIG / yolov10

Questions Regarding TRT inference speed as compared to Yolov8. #57