Closed audreyeternal closed 1 year ago
There are two ways to build trt engine now.
Difference between both is that the activation range is inserted using json
or onnx
@Tracin ,Thank you very much for your reply!
.onnx
file is attached: fnet_3d.zip
Apart from the empty clip_ranges.json
file, seems like no fuse operation is executed either: conv
, bn
, and relu
are separated operators. For the second way, I tried to set deploy_to_qlinear
to true, export the onnx file and use onnx2trt.py
(I removed the calibration part because I think it has already been done before the convert_deploy
function, not sure if it is right) to build the engine:
python onnx2trt.py --onnx-path "onnx_quantized_model.onnx" --data-path "" --trt-path "quantized_model.trt"
The engine can be built but whenever I set the precision args --mode
to int8
or fp32
, I got the same result in the inference step. This implies that the dynamic range is not properly delivered to the engine.
BTW, I tried the 2d version fnet_2d and everything goes well. So I think the glitch may has something to do with the dimension.
This issue has not received any updates in 120 days. Please reply to this issue if this still unresolved!
Hi, a quick question. I implemented the naive PTQ algorithm using MQBench and exported the onnx model. The backend is tensorRT. But I am confused that the
clip_ranges.json
file is empty:Also I tried to build the tensorRT engine by
trtexec
tool:but the converted trt model is still in the precision of fp32:
I am not sure which step is wrong. Could you do me a favor? Thank you!