ModelTC / MQBench

Model Quantization Benchmark
Apache License 2.0
768 stars 140 forks source link

The precision is still fp32 after quantization #207

Closed audreyeternal closed 1 year ago

audreyeternal commented 2 years ago

Hi, a quick question. I implemented the naive PTQ algorithm using MQBench and exported the onnx model. The backend is tensorRT. But I am confused that the clip_ranges.json file is empty:

{
    "tensorrt": {
        "blob_range": {}
    }
}

Also I tried to build the tensorRT engine by trtexec tool:

trtexec --onnx=mqbench_qmodel_deploy_model.onnx --saveEngine=fnet.trt

but the converted trt model is still in the precision of fp32:

[10/18/2022-16:39:10] [I] === Model Options ===
[10/18/2022-16:39:10] [I] Format: ONNX
[10/18/2022-16:39:10] [I] Model: mqbench_qmodel_deploy_model.onnx
[10/18/2022-16:39:10] [I] Output:
[10/18/2022-16:39:10] [I] === Build Options ===
[10/18/2022-16:39:10] [I] Max batch: explicit batch
[10/18/2022-16:39:10] [I] Workspace: 16 MiB
[10/18/2022-16:39:10] [I] minTiming: 1
[10/18/2022-16:39:10] [I] avgTiming: 8
[10/18/2022-16:39:10] [I] Precision: FP32

I am not sure which step is wrong. Could you do me a favor? Thank you!

Tracin commented 2 years ago

There are two ways to build trt engine now.

  1. Backend_Type='Tensorrt' the problem now is that the empty clip_ranges.json which means no activation fakequant node is inserted, what is your model looks like?
  2. Backend_Type='Tensorrt' and https://github.com/ModelTC/MQBench/blob/main/mqbench/convert_deploy.py#L179 is True

Difference between both is that the activation range is inserted using json or onnx

audreyeternal commented 2 years ago

@Tracin ,Thank you very much for your reply!

BTW, I tried the 2d version fnet_2d and everything goes well. So I think the glitch may has something to do with the dimension.

github-actions[bot] commented 1 year ago

This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!