Open minhhotboy9x opened 1 month ago
You can compare the layer info time-profile and fusion tactic.
@lix19937 Can you show me how to log that info?
trtexec --onnx=${your_onnx} --fp16 --verbose --saveEngine=model_sim.plan \
--useCudaGraph --dumpProfile --dumpLayerInfo --separateProfileRun | tee log
You can analyze the perf with Nsight System/Nsight compute. the root cause may varied.
But normally it's not a bug.
Thank you @zerollzeng @lix19937 for your suggestion. I used trt.IProfiler
to get the execution time of layers. When I got the results of original and pruned models respectively, I combined and plotted them:
Original yolov8s:
Pruned yolov8s:
They seem strange since some layers in the pruned model have more latency than that in the original
I think you can use apex/asp
to prune your model.
Description
I have 2 model yolov8s and pruned model yolov8s with smaller size. For the second model, I pruned its channel using structural pruning method of Torch pruning. After pruning with the pruning rate of 0.2, I converted both the original and pruned models to onnx and then converted these onnx models to FP16 engine model on Jetson Nano using python. When I test the FPS, the pruned model is not faster than the original model (Both FPS is about 7.4). I also tried with a pruning rate of 0.4 the pruned model's FPS increased to 8.5, but the increased FPS is too low with such a pruning rate. Here is my layer profile of 2 model: yolov8s.txt yolov8s_0,2_pruning.txt
Environment
TensorRT Version: 8.2.1.8 NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version: 10.2 CUDNN Version: 8.2.1.32
Operating System: Ubuntu 18.04 Python Version (if applicable): 3.6 Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):