The precision is still fp32 after quantization

audreyeternal commented 2 years ago

Hi, a quick question. I implemented the naive PTQ algorithm using MQBench and exported the onnx model. The backend is tensorRT. But I am confused that the clip_ranges.json file is empty:

{
    "tensorrt": {
        "blob_range": {}
    }
}

Also I tried to build the tensorRT engine by trtexec tool:

trtexec --onnx=mqbench_qmodel_deploy_model.onnx --saveEngine=fnet.trt

but the converted trt model is still in the precision of fp32:

[10/18/2022-16:39:10] [I] === Model Options ===
[10/18/2022-16:39:10] [I] Format: ONNX
[10/18/2022-16:39:10] [I] Model: mqbench_qmodel_deploy_model.onnx
[10/18/2022-16:39:10] [I] Output:
[10/18/2022-16:39:10] [I] === Build Options ===
[10/18/2022-16:39:10] [I] Max batch: explicit batch
[10/18/2022-16:39:10] [I] Workspace: 16 MiB
[10/18/2022-16:39:10] [I] minTiming: 1
[10/18/2022-16:39:10] [I] avgTiming: 8
[10/18/2022-16:39:10] [I] Precision: FP32

I am not sure which step is wrong. Could you do me a favor? Thank you!

Tracin commented 2 years ago

There are two ways to build trt engine now.

Backend_Type='Tensorrt' the problem now is that the empty clip_ranges.json which means no activation fakequant node is inserted, what is your model looks like?
Backend_Type='Tensorrt' and https://github.com/ModelTC/MQBench/blob/main/mqbench/convert_deploy.py#L179 is True

Difference between both is that the activation range is inserted using json or onnx

audreyeternal commented 2 years ago

@Tracin ,Thank you very much for your reply!

For the first way, the model's structure is similar to UNet-3d : fnet_3d. The .onnx file is attached: fnet_3d.zip Apart from the empty clip_ranges.json file, seems like no fuse operation is executed either: conv, bn, and relu are separated operators.
For the second way, I tried to set deploy_to_qlinear to true, export the onnx file and use onnx2trt.py(I removed the calibration part because I think it has already been done before the convert_deploy function, not sure if it is right) to build the engine:
```
python onnx2trt.py --onnx-path "onnx_quantized_model.onnx" --data-path "" --trt-path "quantized_model.trt"
```
The engine can be built but whenever I set the precision args --mode to int8 or fp32, I got the same result in the inference step. This implies that the dynamic range is not properly delivered to the engine.

BTW, I tried the 2d version fnet_2d and everything goes well. So I think the glitch may has something to do with the dimension.

github-actions[bot] commented 1 year ago

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!

ModelTC / MQBench

The precision is still fp32 after quantization #207