Closed ghost closed 1 year ago
@nvpohanh Do you know the reason? I'm curious about it too.
@rakandhiya Could you please share the onnx here? I can take a further check.
I've updated my first post, thank you for the response!
In TRT 8.2, we have some special conv kernels that require pre-computed masks for better speed at inference. Unfortunately, that caused the engine size to increase.
In later TRT version, you can disable those conv tactics by disabling the kEDGE_MASK_CONVOLUTIONS
tactic source: https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.TacticSource
I tried both of this to double check if the options exists:
polygraphy convert MobileNetV3Small.onnx --int8 -o sample.engine --tactic-sources jit_convolutions
polygraphy convert MobileNetV3Small.onnx --int8 -o sample.engine --tactic-sources edge_mask_convolutions
but I got the error:
AttributeError: type object 'tensorrt.tensorrt.TacticSource' has no attribute 'EDGE_MASK_CONVOLUTIONS'
after checking the documentation of options with trtxec -h
, I got this
--tacticSources=tactics Specify the tactics to be used by adding (+) or removing (-) tactics from the default
tactic sources (default = all available tactics).
Note: Currently only cuDNN, cuBLAS and cuBLAS-LT are listed as optional tactics.
Tactic Sources: tactics ::= [","tactic]
tactic ::= (+|-)lib
lib ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"
For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS
I take it with my hardware/versions disabling it is not possible?
Yes, that flag is only supported in later TRT (8.5 or 8.6, I think)
Got it, thank you for the help!
Closed, feel free to reopen it if you have any further questions.
Description
I managed to convert my ONNX models to be a TensorRT engine with the polygraphy and trtexec into FP16 and INT8 engine. my model is MobileNetV3Small from TensorFlow, its size is about 11 MB. after converting to ONNX it amounts to 6.3 MB. FP16 engine's size is 4.0 MB, but INT8 engine's size is 7.1 MB. I tested INT8 conversion with and without calibration cache, it's still around 7 MB.
Environment
TensorRT Version: 8.2 NVIDIA GPU: Maxwell, Jetson Nano NVIDIA Driver Version: CUDA Version: 10.2.300 CUDNN Version: 8.2.1.32 Operating System: Ubuntu 18.04 Python Version (if applicable): 3.6.9 Tensorflow Version (if applicable): 2.7 PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
ONNX file: https://drive.google.com/file/d/1duQF49gjMg5bNuhx40H88FA8d061EvHP/view?usp=share_link
Calibration file: https://drive.google.com/file/d/163SVs6ip5IHAe5kwAD1PE_nYN9HOYVwe/view?usp=share_link
Steps To Reproduce
run the command
polygraphy convert MobileNetV3Small.onnx --int8 --calibration-cache oxford_flowers102_calib.cache -o sample.engine