Closed joihn closed 2 years ago
@joihn Thank you for reporting this issue!
As the log tips, currently, TensorRT requires quantization parameters when working with INT8 data type . The parameter can be either provided by Q/DQ or PTQ. If you expect these INT8 to be a representaiton of float data, refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8 to check how to setup.
Or, if you are sure these INT8 data can be converted to float and back like static_cast<float>(static_cast<int8>(fp32_val)) == fp32_val)
, you can try to set the dynamic range of the input via the nvinfer1::ITensor::setDynamicRange()
API with min == -127, max == 127
to see if it resolves your issue.
Closing due to no response for >14 days. Please feel free to reopen if the issue still exists. Thanks
Description
I have a FP32 model. To optimise transfer speed of my image to GPU, I want to transfer my image as Int8 instead of FP32.
I modified my ONNX model to receive image in int8, and cast them to FP32 on the GPU for futher processing This work very well with ONNX. When I try to convert to TensoRT (since I'm using triton inference server, they call it "adding the TensorRT optimisation to your model "), I get the folllwoing error:
Environment
Triton inference server 2.21.0