Closed zhanghuqiang closed 1 year ago
Looks like something is wrong with your onnx, @ttyio any suggestion here?
@zhanghuqiang , we have constant(int8) + dq support since TRT8.5. For 8.4, please disable constant folding during the onnx export, thx!
torch.onnx.export(..., do_constant_folding=False, ...)
I also have the same question. Is there any difference in loading the trt serialized engine for Fp32, Fp16, Int8?
I did the following approach
When I load (deserialize) the fp16trt engine and ran the inference on ARM64 gives same inference time as INT8 trt engine.
Our ONNX model is a detection model (efficientdet model).
Could please help me to understand the process of deserializing the model int8 so that we can have benefit of inference time.
@anilknayak sorry for the late response, the deserialize is the same for engine file with different precision. The kernels are already selected when saving the engine file. Make sure engine build and inference are in the same device. Also we can enable verbose log level during the engine build process to get more details.
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!
Description
I export using these code ,which is copy from https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/userguide.html#export-to-onnx Here's the code:
and get some warning
and the onnx seems to be exported successfully. But when I try load it using tensorRT,
the error information is
Is there something wrong with me, or just a BUG.
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: 2060 NVIDIA Driver Version: CUDA Version: 11.6 CUDNN Version: 8.4.1.50 Operating System: windows Python Version (if applicable): 3.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.12 Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce