NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Apache License 2.0
10.17k stars 2.08k forks source link

Smaller pruned model yolov8s doesn't faster than original yolov8s on Tensor RT Jetson Nano #3884

Open minhhotboy9x opened 1 month ago

minhhotboy9x commented 1 month ago


I have 2 model yolov8s and pruned model yolov8s with smaller size. For the second model, I pruned its channel using structural pruning method of Torch pruning. After pruning with the pruning rate of 0.2, I converted both the original and pruned models to onnx and then converted these onnx models to FP16 engine model on Jetson Nano using python. When I test the FPS, the pruned model is not faster than the original model (Both FPS is about 7.4). I also tried with a pruning rate of 0.4 the pruned model's FPS increased to 8.5, but the increased FPS is too low with such a pruning rate. Here is my layer profile of 2 model: yolov8s.txt yolov8s_0,2_pruning.txt


TensorRT Version: NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version: 10.2 CUDNN Version:

Operating System: Ubuntu 18.04 Python Version (if applicable): 3.6 Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

lix19937 commented 1 month ago

You can compare the layer info time-profile and fusion tactic.

minhhotboy9x commented 1 month ago

@lix19937 Can you show me how to log that info?

lix19937 commented 1 month ago
trtexec --onnx=${your_onnx} --fp16 --verbose --saveEngine=model_sim.plan \
--useCudaGraph  --dumpProfile --dumpLayerInfo --separateProfileRun | tee log
zerollzeng commented 1 month ago

You can analyze the perf with Nsight System/Nsight compute. the root cause may varied.

zerollzeng commented 1 month ago

But normally it's not a bug.

minhhotboy9x commented 1 month ago

Thank you @zerollzeng @lix19937 for your suggestion. I used trt.IProfiler to get the execution time of layers. When I got the results of original and pruned models respectively, I combined and plotted them: Original yolov8s: image Pruned yolov8s: image They seem strange since some layers in the pruned model have more latency than that in the original

lix19937 commented 1 week ago

I think you can use apex/asp to prune your model.