NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

Error when load a int8 cache using trtexec #1703

Closed XXXVincent closed 2 years ago

XXXVincent commented 2 years ago

Trtexec works fine without specifying a int8 cache file, but throws a error when loading int8 cache file.

/usr/src/tensorrt/bin/trtexec --onnx=erfnet.onnx --int8 --saveEngine=erf_int8.engine --calib=erfnet_quantized_int8.cache --verbose

_----- Parsing of ONNX model erfnet.onnx is Done ---- [01/04/2022-16:10:16] [V] [TRT] Original: 83 layers [01/04/2022-16:10:16] [V] [TRT] After dead-layer removal: 83 layers [01/04/2022-16:10:16] [V] [TRT] After scale fusion: 83 layers [01/04/2022-16:10:16] [V] [TRT] After vertical fusions: 83 layers [01/04/2022-16:10:16] [V] [TRT] After final dead-layer removal: 83 layers [01/04/2022-16:10:16] [V] [TRT] After concat removal: 83 layers [01/04/2022-16:10:16] [V] [TRT] After tensor merging: 83 layers terminate called after throwing an instance of 'std::bad_alloc' what(): std::badalloc Aborted (core dumped)

ttyio commented 2 years ago

@XXXVincent how did you generate erfnet_quantized_int8.cache? thanks

XXXVincent commented 2 years ago

@XXXVincent how did you generate erfnet_quantized_int8.cache? thanks

Using code from this repo : https://github.com/Wulingtian/yolov5_tensorrt_int8_tools, it's a tensorrt quantization tool written in python, and the cache file looks pretty normal image

XXXVincent commented 2 years ago

How is that possible when I specified a none-existsed calib file and still get a decent result? However when not specifying a calib file, the result infered by exported int8 model is totally wrong?

_&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=lannet_20220308.onnx --calib=a_not_exists_file.notexist --int8 --explicitBatch --saveEngine=test_not_exists.engine_

And also, I found the results of int8 model infered between Linux(RTX2060) and QNX platform are quite different. So weird.

The command I used for exporting int8 model is: _trtexec --onnx=lannet_20220308.onnx --calib=calibint8.bin--int8 --explicitBatch --saveEngine=int8.engine

And the commands I used for int8 model inference is:

trtexec --loadEngine=int8.engine --exportOutput=result.json --duration=0.005 --iterations=1 --avgRuns=1 --loadInputs='input.1':img.bin --int8

@ttyio

nvpohanh commented 2 years ago

@XXXVincent When no calib file is provided, trtexec simply use random dynamic ranges for all tensors. That's why you got wrong outputs.

nvpohanh commented 2 years ago

Could you share the ONNX file, which TRT version, which OS, and which GPU(s) you used? Thanks

nvpohanh commented 2 years ago

Closing due to >14 days without activity. Please feel free to reopen if the issue still exists. Thanks