Closed QZ-cmd closed 3 years ago
@QZ-cmd , FYI, we have a white paper on int8 quantization http://arxiv.org/abs/2004.09602
could you try different calibration method and calibration dataset, and maybe some tricks like replace SiLU/LeakyReLU with ReLU.
Do you know any other calibration methods? I tried 1,000 calibration datasets and 4,000 calibration data sets, and the result was a model for int8, map=0, but I also converted to fp16 for testing is normal.I tried the coco dataset, but the accuracy was lost at int8, and I suspect there was a problem with model,is not data problem.if you need onnx-model,please tell me,looking forward to your reply.
Hello @QZ-cmd , sorry we do not have bandwidth to debug INT8 accuracy issue unless it is TRT bug.
TRT support entropy, min-max, percentile-max. And if you use nvidia pytorch-quantization (https://github.com/NVIDIA/TensorRT/tree/release/7.2/tools/pytorch-quantization), there are more calibration methods include QAT, The white paper http://arxiv.org/abs/2004.09602 has some sample networks and int8 receipts.
It is strange that FP16-trt is normal, int8-trt conversion process did not report errors, but the accuracy of all disappeared, this may be where the problem?
this is int8 calibration excel.
[TensorRT] VERBOSE: Fastest Tactic: 256 Time: 0.050816
[TensorRT] VERBOSE: --------------- Timing Runner: LeakyRelu_64 (PointWiseV2)
[TensorRT] VERBOSE: Tactic: 10 time 0.015008
[TensorRT] VERBOSE: Tactic: 11 time 0.014848
[TensorRT] VERBOSE: Tactic: 12 time 0.01648
[TensorRT] VERBOSE: Tactic: 13 time 0.015008
[TensorRT] VERBOSE: Tactic: 14 time 0.017344
[TensorRT] VERBOSE: Tactic: 15 time 0.020224
[TensorRT] VERBOSE: Tactic: 16 time 0.020096
[TensorRT] VERBOSE: Tactic: 17 time 0.020928
[TensorRT] VERBOSE: Tactic: 18 time 0.026208
[TensorRT] VERBOSE: Tactic: 19 time 0.027744
[TensorRT] VERBOSE: Tactic: 20 time 0.015968
[TensorRT] VERBOSE: Tactic: 21 time 0.0136
[TensorRT] VERBOSE: Tactic: 22 time 0.014816
[TensorRT] VERBOSE: Tactic: 23 time 0.014336
[TensorRT] VERBOSE: Tactic: 24 time 0.019104
[TensorRT] VERBOSE: Tactic: 25 time 0.016832
[TensorRT] VERBOSE: Tactic: 26 time 0.01456
[TensorRT] VERBOSE: Tactic: 27 time 0.01376
[TensorRT] VERBOSE: Fastest Tactic: 21 Time: 0.0136
[TensorRT] VERBOSE: >>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 21
[TensorRT] VERBOSE:
[TensorRT] VERBOSE: --------------- Timing Runner:
Hello @QZ-cmd , TRT only provide calibration algorithm to generate the INT8 scales, and inference solution to run the network in given precision. So for most of int8 accuracy issue, the TRT functionality is OK, the problem might:
So this is the reason why no build failure for INT8, but accuracy drops.
Besides explorer the different calibration methods. Another approach you can do is to enable mixed precision, and mark some of the layer run on higher precision (half, float) while rest of the network run on INT8. Here is some code that you could follow: https://github.com/NVIDIA/TensorRT/blob/release/7.1/demo/BERT/builder.py#L590 https://github.com/NVIDIA/TensorRT/blob/release/7.1/demo/BERT/builder.py#L245
If you donot know which layers are sensitive to accuracy, you could even choose half of the network run on FP32, and then use divide-and-conquer to solve this issue.
Close since no activity for more than 3 weeks, please reopen if you still have question, thanks!
@QZ-cmd have you sloved it.
I was able to convert the trt_model fp16 and the reasoning was normal, and I converted the int8’s trt_model with 1000 pictures, but the test map dropped to 0。I would like to ask where there may be a problem, how to debug。 In addition, I compared the output of fp16 and int8, and tested the same picture. Their score information was almost the same, and their location information was much worse. In addition, will your single channel gray image be affected by int8