derronqi / yolov8-face

yolov8 face detection with landmark
GNU General Public License v3.0
485 stars 64 forks source link

TRTExec Results #29

Open SwEngine opened 2 months ago

SwEngine commented 2 months ago

Why yolov8-lite-s is slower than yolov8n?

yolov8-lite-t:

[06/28/2024-07:57:17] [I] Host Latency
[06/28/2024-07:57:17] [I] min: 35.335 ms (end to end 35.3481 ms)
[06/28/2024-07:57:17] [I] max: 35.6085 ms (end to end 35.6216 ms)
[06/28/2024-07:57:17] [I] mean: 35.4585 ms (end to end 35.4713 ms)
[06/28/2024-07:57:17] [I] median: 35.4502 ms (end to end 35.4629 ms)
[06/28/2024-07:57:17] [I] percentile: 35.6085 ms at 99% (end to end 35.6216 ms at 99%)
[06/28/2024-07:57:17] [I] throughput: 28.1913 qps
[06/28/2024-07:57:17] [I] walltime: 3.08606 s
[06/28/2024-07:57:17] [I] Enqueue Time
[06/28/2024-07:57:17] [I] min: 7.15283 ms
[06/28/2024-07:57:17] [I] max: 7.86804 ms
[06/28/2024-07:57:17] [I] median: 7.23523 ms
[06/28/2024-07:57:17] [I] GPU Compute
[06/28/2024-07:57:17] [I] min: 34.5927 ms
[06/28/2024-07:57:17] [I] max: 34.8673 ms
[06/28/2024-07:57:17] [I] mean: 34.7172 ms
[06/28/2024-07:57:17] [I] median: 34.708 ms
[06/28/2024-07:57:17] [I] percentile: 34.8673 ms at 99%
[06/28/2024-07:57:17] [I] total compute time: 3.02039 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --fp16 --verbose --onnx=yolov8-lite-t.onnx

yolov8-lite-s:

[06/28/2024-07:41:58] [I] Host Latency
[06/28/2024-07:41:58] [I] min: 72.23 ms (end to end 72.2429 ms)
[06/28/2024-07:41:58] [I] max: 72.8362 ms (end to end 72.8488 ms)
[06/28/2024-07:41:58] [I] mean: 72.5264 ms (end to end 72.539 ms)
[06/28/2024-07:41:58] [I] median: 72.5417 ms (end to end 72.5546 ms)
[06/28/2024-07:41:58] [I] percentile: 72.8362 ms at 99% (end to end 72.8488 ms at 99%)
[06/28/2024-07:41:58] [I] throughput: 13.7856 qps
[06/28/2024-07:41:58] [I] walltime: 3.19174 s
[06/28/2024-07:41:58] [I] Enqueue Time
[06/28/2024-07:41:58] [I] min: 8.63037 ms
[06/28/2024-07:41:58] [I] max: 8.99683 ms
[06/28/2024-07:41:58] [I] median: 8.73773 ms
[06/28/2024-07:41:58] [I] GPU Compute
[06/28/2024-07:41:58] [I] min: 71.489 ms
[06/28/2024-07:41:58] [I] max: 72.094 ms
[06/28/2024-07:41:58] [I] mean: 71.7866 ms
[06/28/2024-07:41:58] [I] median: 71.803 ms
[06/28/2024-07:41:58] [I] percentile: 72.094 ms at 99%
[06/28/2024-07:41:58] [I] total compute time: 3.15861 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --fp16 --verbose --onnx=yolov8-lite-s.onnx

yolov8n-face:

[06/28/2024-08:05:24] [I] Host Latency
[06/28/2024-08:05:24] [I] min: 42.7943 ms (end to end 42.804 ms)
[06/28/2024-08:05:24] [I] max: 43.2454 ms (end to end 43.2556 ms)
[06/28/2024-08:05:24] [I] mean: 42.9283 ms (end to end 42.9381 ms)
[06/28/2024-08:05:24] [I] median: 42.9003 ms (end to end 42.9103 ms)
[06/28/2024-08:05:24] [I] percentile: 43.2454 ms at 99% (end to end 43.2556 ms at 99%)
[06/28/2024-08:05:24] [I] throughput: 23.289 qps
[06/28/2024-08:05:24] [I] walltime: 3.09159 s
[06/28/2024-08:05:24] [I] Enqueue Time
[06/28/2024-08:05:24] [I] min: 5.10181 ms
[06/28/2024-08:05:24] [I] max: 5.65771 ms
[06/28/2024-08:05:24] [I] median: 5.24692 ms
[06/28/2024-08:05:24] [I] GPU Compute
[06/28/2024-08:05:24] [I] min: 42.0552 ms
[06/28/2024-08:05:24] [I] max: 42.5093 ms
[06/28/2024-08:05:24] [I] mean: 42.1906 ms
[06/28/2024-08:05:24] [I] median: 42.1644 ms
[06/28/2024-08:05:24] [I] percentile: 42.5093 ms at 99%
[06/28/2024-08:05:24] [I] total compute time: 3.03772 s