Problem with INT8 model size

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.87k stars 2.14k forks source link

Problem with INT8 model size #2898

Closed ghost closed 1 year ago

ghost commented 1 year ago

Description

I managed to convert my ONNX models to be a TensorRT engine with the polygraphy and trtexec into FP16 and INT8 engine. my model is MobileNetV3Small from TensorFlow, its size is about 11 MB. after converting to ONNX it amounts to 6.3 MB. FP16 engine's size is 4.0 MB, but INT8 engine's size is 7.1 MB. I tested INT8 conversion with and without calibration cache, it's still around 7 MB.

Environment

TensorRT Version: 8.2 NVIDIA GPU: Maxwell, Jetson Nano NVIDIA Driver Version: CUDA Version: 10.2.300 CUDNN Version: 8.2.1.32 Operating System: Ubuntu 18.04 Python Version (if applicable): 3.6.9 Tensorflow Version (if applicable): 2.7 PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

ONNX file: https://drive.google.com/file/d/1duQF49gjMg5bNuhx40H88FA8d061EvHP/view?usp=share_link

Calibration file: https://drive.google.com/file/d/163SVs6ip5IHAe5kwAD1PE_nYN9HOYVwe/view?usp=share_link

Steps To Reproduce

run the command polygraphy convert MobileNetV3Small.onnx --int8 --calibration-cache oxford_flowers102_calib.cache -o sample.engine

zerollzeng commented 1 year ago

@nvpohanh Do you know the reason? I'm curious about it too.

@rakandhiya Could you please share the onnx here? I can take a further check.

ghost commented 1 year ago

I've updated my first post, thank you for the response!

nvpohanh commented 1 year ago

In TRT 8.2, we have some special conv kernels that require pre-computed masks for better speed at inference. Unfortunately, that caused the engine size to increase.

In later TRT version, you can disable those conv tactics by disabling the kEDGE_MASK_CONVOLUTIONS tactic source: https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.TacticSource

ghost commented 1 year ago

I tried both of this to double check if the options exists: polygraphy convert MobileNetV3Small.onnx --int8 -o sample.engine --tactic-sources jit_convolutions polygraphy convert MobileNetV3Small.onnx --int8 -o sample.engine --tactic-sources edge_mask_convolutions

but I got the error: AttributeError: type object 'tensorrt.tensorrt.TacticSource' has no attribute 'EDGE_MASK_CONVOLUTIONS'

after checking the documentation of options with trtxec -h, I got this

--tacticSources=tactics     Specify the tactics to be used by adding (+) or removing (-) tactics from the default 
                              tactic sources (default = all available tactics).
                              Note: Currently only cuDNN, cuBLAS and cuBLAS-LT are listed as optional tactics.
                              Tactic Sources: tactics ::= [","tactic]
                                              tactic  ::= (+|-)lib
                                              lib     ::= "CUBLAS"|"CUBLAS_LT"|"CUDNN"
                              For example, to disable cudnn and enable cublas: --tacticSources=-CUDNN,+CUBLAS

I take it with my hardware/versions disabling it is not possible?

nvpohanh commented 1 year ago

Yes, that flag is only supported in later TRT (8.5 or 8.6, I think)

ghost commented 1 year ago

Got it, thank you for the help!

zerollzeng commented 1 year ago

Closed, feel free to reopen it if you have any further questions.