NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.17k stars 2.08k forks source link

Smaller pruned model yolov8s doesn't faster than original yolov8s on Tensor RT Jetson Nano #3884

Open minhhotboy9x opened 1 month ago

minhhotboy9x commented 1 month ago

Description

I have 2 model yolov8s and pruned model yolov8s with smaller size. For the second model, I pruned its channel using structural pruning method of Torch pruning. After pruning with the pruning rate of 0.2, I converted both the original and pruned models to onnx and then converted these onnx models to FP16 engine model on Jetson Nano using python. When I test the FPS, the pruned model is not faster than the original model (Both FPS is about 7.4). I also tried with a pruning rate of 0.4 the pruned model's FPS increased to 8.5, but the increased FPS is too low with such a pruning rate. Here is my layer profile of 2 model: yolov8s.txt yolov8s_0,2_pruning.txt

Environment

TensorRT Version: 8.2.1.8 NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version: 10.2 CUDNN Version: 8.2.1.32

Operating System: Ubuntu 18.04 Python Version (if applicable): 3.6 Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

lix19937 commented 1 month ago

You can compare the layer info time-profile and fusion tactic.

minhhotboy9x commented 1 month ago

@lix19937 Can you show me how to log that info?

lix19937 commented 1 month ago
trtexec --onnx=${your_onnx} --fp16 --verbose --saveEngine=model_sim.plan \
--useCudaGraph  --dumpProfile --dumpLayerInfo --separateProfileRun | tee log
zerollzeng commented 1 month ago

You can analyze the perf with Nsight System/Nsight compute. the root cause may varied.

zerollzeng commented 1 month ago

But normally it's not a bug.

minhhotboy9x commented 1 month ago

Thank you @zerollzeng @lix19937 for your suggestion. I used trt.IProfiler to get the execution time of layers. When I got the results of original and pruned models respectively, I combined and plotted them: Original yolov8s: image Pruned yolov8s: image They seem strange since some layers in the pruned model have more latency than that in the original

lix19937 commented 1 week ago

I think you can use apex/asp to prune your model.