INT8 and FP32 in the same ONNX model

joihn commented 2 years ago

Description

I have a FP32 model. To optimise transfer speed of my image to GPU, I want to transfer my image as Int8 instead of FP32.

I modified my ONNX model to receive image in int8, and cast them to FP32 on the GPU for futher processing This work very well with ONNX. When I try to convert to TensoRT (since I'm using triton inference server, they call it "adding the TensorRT optimisation to your model "), I get the folllwoing error:

 [E:onnxruntime:log, tensorrt_execution_provider.h:51 log] [2022-05-10 08:31:40   ERROR] 4: input_: input/output with DataType Int8 in network without Q/DQ layers must have dynamic range set when
 no calibrator is used.
[E:onnxruntime:log, tensorrt_execution_provider.h:51 log] [2022-05-10 08:31:40   ERROR] 4: [network.cpp::validate::2635] Error Code 4: Internal Error (DataType does not match TensorFormats.)

Environment

Triton inference server 2.21.0

zhenhuaw-me commented 2 years ago

@joihn Thank you for reporting this issue!

As the log tips, currently, TensorRT requires quantization parameters when working with INT8 data type . The parameter can be either provided by Q/DQ or PTQ. If you expect these INT8 to be a representaiton of float data, refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8 to check how to setup.

Or, if you are sure these INT8 data can be converted to float and back like static_cast<float>(static_cast<int8>(fp32_val)) == fp32_val), you can try to set the dynamic range of the input via the nvinfer1::ITensor::setDynamicRange() API with min == -127, max == 127 to see if it resolves your issue.

nvpohanh commented 2 years ago

Closing due to no response for >14 days. Please feel free to reopen if the issue still exists. Thanks

NVIDIA / TensorRT

INT8 and FP32 in the same ONNX model #1973

Description

Environment