Closed ShiinaMitsuki closed 3 years ago
Hello @ShiinaMitsuki , thanks for reporting.
The full support for the onnx model exported from pytorch-quantization tool then import into ONNX-trt will be available in next major release. before that we have to use setDynamicRange
to import ONNX int8 network. there is a sample DemoBERT use this method:
see load_onnx_weights_and_quant
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478
see set_dynamic_range
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Hello @ShiinaMitsuki , thanks for reporting. The full support for the onnx model exported from pytorch-quantization tool then import into ONNX-trt will be available in next major release. before that we have to use
setDynamicRange
to import ONNX int8 network. there is a sample DemoBERT use this method: seeload_onnx_weights_and_quant
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 seeset_dynamic_range
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Thanks for the reply.
How's the ONNX exported from pytorch after fake quantization using pytorch-quantization
package?
I followed the guidance in:
https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/userguide.html#export-to-onnx
and paste the code snippet into torch/onnx/symbolic_opset10.py
, then export my model using
import torch
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import quant_modules
quant_nn.TensorQuantizer.use_fb_fake_quant = True
quant_modules.initialize()
from test.models.rfdn import RFDN_ASX4_nf64m2
model = RFDN_ASX4_nf64m2()
calibrated_model = 'checkpoints/rfdn_asx4_nf64nm2inc3_calibrated.pt'
onnx_save_path = calibrated_model.replace('.pt', '_op10.onnx')
state_dict = torch.load(calibrated_model, map_location='cpu')
model.load_state_dict(state_dict)
model.cuda()
model.eval()
dummy_input = torch.zeros(1, 3, 2160, 3840, requires_grad=False).cuda()
torch.set_grad_enabled(False)
# enable_onnx_checker needs to be disabled.
torch.onnx.export(model,
dummy_input,
onnx_save_path,
verbose=True,
input_names=['input'],
output_names=['output'],
opset_version=10,
enable_onnx_checker=False)
I compared my exported ONNX model with the BERT model(bert_large_v1_1_fake_quant.onnx) by print the weight name
model = onnx.load(path)
# print(onnx.helper.printable_graph(model.graph))
weights = model.graph.initializer
for w in weights:
print(w.name)
It's quite different .
below is from my model:
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
B1.c1_d.bias
B1.c1_d.weight
B1.c1_r.bias
B1.c1_r.weight
B1.c2_d.bias
B1.c2_d.weight
B1.c2_r.bias
B1.c2_r.weight
B1.c3_d.bias
B1.c3_d.weight
B1.c3_r.bias
B1.c3_r.weight
B1.c4.bias
B1.c4.weight
B1.c5.bias
B1.c5.weight
B1.esa.conv1.bias
B1.esa.conv1.weight
B1.esa.conv2.bias
B1.esa.conv2.weight
B1.esa.conv3.bias
B1.esa.conv3.weight
B1.esa.conv3_.bias
B1.esa.conv3_.weight
B1.esa.conv4.bias
B1.esa.conv4.weight
B1.esa.conv_f.bias
B1.esa.conv_f.weight
B1.esa.conv_max.bias
B1.esa.conv_max.weight
B2.c1_d.bias
B2.c1_d.weight
B2.c1_r.bias
B2.c1_r.weight
B2.c2_d.bias
B2.c2_d.weight
B2.c2_r.bias
B2.c2_r.weight
B2.c3_d.bias
B2.c3_d.weight
B2.c3_r.bias
B2.c3_r.weight
B2.c4.bias
B2.c4.weight
B2.c5.bias
B2.c5.weight
B2.esa.conv1.bias
B2.esa.conv1.weight
B2.esa.conv2.bias
B2.esa.conv2.weight
B2.esa.conv3.bias
B2.esa.conv3.weight
B2.esa.conv3_.bias
B2.esa.conv3_.weight
B2.esa.conv4.bias
B2.esa.conv4.weight
B2.esa.conv_f.bias
B2.esa.conv_f.weight
B2.esa.conv_max.bias
B2.esa.conv_max.weight
LR_conv3.bias
LR_conv3.weight
c3.0.bias
c3.0.weight
fea_conv.0.bias
fea_conv.0.weight
fea_conv.1.bias
fea_conv.1.weight
upsamplerx4.0.bias
upsamplerx4.0.weight
and part of Bert,
bert.embeddings.LayerNorm.bias
bert.embeddings.LayerNorm.weight
bert.embeddings.position_embeddings._weight_quantizer._amax
bert.embeddings.position_embeddings.weight
bert.embeddings.token_type_embeddings._weight_quantizer._amax
bert.embeddings.token_type_embeddings.weight
bert.embeddings.word_embeddings._weight_quantizer._amax
bert.embeddings.word_embeddings.weight
bert.encoder.final_input_quantizer._amax
bert.encoder.layer.0.attention.output.LayerNorm.bias
bert.encoder.layer.0.attention.output.LayerNorm.weight
bert.encoder.layer.0.attention.output.add_local_input_quantizer._amax
bert.encoder.layer.0.attention.output.add_residual_input_quantizer._amax
...
there's no _quantizer._amax
like name from my model, don't known why.
Hello @ShiinaMitsuki ,
a_max is only computed after QAT training, and we need fake_quant to do this, also we need train and compute amax;
we need fb_fake_quant when export to onnx, so call quant_nn.TensorQuantizer.use_fb_fake_quant = True
before export to ONNX instead of at the begining.
please follow this sample for more details https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/tutorials/quant_resnet50.html.
hi @ttyio , I also meet the same issue, could you help me ? Here is the graph: and the log is
Hello @JosephChenHub , currently the opensourced pytorch-quantization can generated ONNX, but the importer from onnx with Q/DQ nodes to TRT is not ready in 7.x. The full support will be available in next major release. before that we have to use setDynamicRange to import ONNX int8 network.
There is a sample DemoBERT use this method: see load_onnx_weights_and_quant in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 see set_dynamic_range in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Hello @JosephChenHub , currently the opensourced pytorch-quantization can generated ONNX, but the importer from onnx with Q/DQ nodes to TRT is not ready in 7.x. The full support will be available in next major release. before that we have to use setDynamicRange to import ONNX int8 network.
There is a sample DemoBERT use this method: see load_onnx_weights_and_quant in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 see set_dynamic_range in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Do you mean that we can manually set the dynamic range of each tensor via reading the scale after QAT ?
Hello @JosephChenHub , currently the opensourced pytorch-quantization can generated ONNX, but the importer from onnx with Q/DQ nodes to TRT is not ready in 7.x. The full support will be available in next major release. before that we have to use setDynamicRange to import ONNX int8 network. There is a sample DemoBERT use this method: see load_onnx_weights_and_quant in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 see set_dynamic_range in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Do you mean that we can manually set the dynamic range of each tensor via reading the scale after QAT ?
Yes, we can load the amax from ONNX, set the per tensor activation scale using setDynamicRange
, and the per channel scale for weights is automatically set in TensorRT.
Hello @ShiinaMitsuki , a_max is only computed after QAT training, and we need fake_quant to do this, also we need train and compute amax; we need fb_fake_quant when export to onnx, so call
quant_nn.TensorQuantizer.use_fb_fake_quant = True
before export to ONNX instead of at the begining. please follow this sample for more details https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/tutorials/quant_resnet50.html.
Hello @ShiinaMitsuki, Do you produce '_quantizer._amax' sucessfully? I am still failed
Hello @ShiinaMitsuki , thanks for reporting. The full support for the onnx model exported from pytorch-quantization tool then import into ONNX-trt will be available in next major release. before that we have to use
setDynamicRange
to import ONNX int8 network. there is a sample DemoBERT use this method: seeload_onnx_weights_and_quant
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L478 seeset_dynamic_range
in https://github.com/NVIDIA/TensorRT/blob/release/7.2/demo/BERT/builder.py#L113
Hi, @ttyio Can TensorRT 8.0 import pytorch-quantization onnx model, I mean if I can parse the onnx model and bulid available trt engine?
@Ricardosuzaku yes.
Closing since now activity for more than 3 weeks, please reopen if you still have question, thanks!
Hi @ttyio , I upgraded TensorRT into v8.0.1.6 (GA), but the error on top of the topic still appear, I am not sure that RT version did not run in v8 yet or something. Can you give some idea? Thank you.
Hi @k9ele7en , how did you generate the onnx? are you using trtexec or other tool to run the onnx? thanks
Thanks for your response, I wrote explicit code to convert ONNX into RT engine, already used set_flag to use INT8 for quantized model...
Hello @k9ele7en , is your onnx generated using pytorch-quantization toolbox ?
I follow the export ONNX as in example (https://github.com/NVIDIA/TensorRT/blob/master/tools/pytorch-quantization/examples/torchvision/classification_flow.py): kind of add quant_nn.TensorQuantizer.use_fb_fake_quant = True, quant_modules.initialize() before init model... you know. Ah I also use opset 13 for ONNX. Do you mean I need to add this part in the image into torch/onnx/symbolic_opset10.py?
@k9ele7en , no need to change the symbolic_opset10.py
, have you tried ngc container, e.g, nvcr.io/nvidian/pytorch:21.07? thanks
I ran locally in Conda environment, with Torch 1.9, TensorRT 8 installed. You think problem come from torch.onnx and use ngc container may solve the it? I need some reason before try it, you know...
@k9ele7en , seems some weights count not matching in the model, could you provide simple repro for debug? thanks
Thanks @ttyio for giving some direction, but this is an internal project, I cannot share the explicit codes. In overall, I use monkey-patching to replace QuantConv2d layer in the original network, do PTQ, then convert into ONNX, but got error in RT convert step... Is RT 8.0 support INT8 Quantized model fully?
@k9ele7en , is the automatic QAT model passed TRT? if so could you compare the difference between the automatic one and monkey patch one. We cannot run on TRT with all arbitrary quantization setting, like the per channel scale can only be added in the output feature channel of weights for convolution.
@ttyio tt
"there's no _quantizer._amax like name from my model, don't known why."
I got this problem too. Both PTQ and QAT .pth models have _amax
but not in .onnx model.
@ShiinaMitsuki @k9ele7en @maoxiaoming86 Did you guys pass this issue? I would love to know. Thank you.
@thanhnt-2658 Did you figure this out?
Sorry for the delayed response.
Seems we no longer have _amax
in the exported onnx, might due to these _amax
actually are unused weights. Could you use the _amax
in the checkpoint? or maybe use the y_scale
* 127 in the QuantizeLinear
node as the _amax
?
Thanks
Sorry for the delayed response. Seems we no longer have
_amax
in the exported onnx, might due to these_amax
actually are unused weights. Could you use the_amax
in the checkpoint? or maybe use they_scale
* 127 in theQuantizeLinear
node as the_amax
? Thanks
@ttyio do I still need to do this if I use TensorRT 8 or does it work automatically?
@Scass0807 , for TRT8 and later you can direct import the ONNX with Q/DQ into trt, without manually call setDynamicRange.
@Scass0807 , for TRT8 and later you can direct import the ONNX with Q/DQ into trt, without manually call setDynamicRange.
Does TRT8 support ONNX with Q/DQ from pytorch-quantization? or from original of pytorch ? or both?
@maoxiaoming86 , from pytorch-quantization, thanks!
@ttyio Hello, can you explain how to get the value of amax? How is this amax calculated? Is there a corresponding calculation formula,after QAT model?
Description
Error occurred parsing fake quantization ONNX model using TensorRT7.2.1.6 following the guidance of
pytorch-quantization
toolbox provided in TensorRT7.2 release.Error Message:
Environment
TensorRT Version: 7.2.1.6 GPU Type: NVIDIA RTX 2070 Nvidia Driver Version: 440.33.01 CUDA Version: CUDA 10.2 CUDNN Version: CUDNN 8.0 Operating System + Version: Ubuntu 1604 Python Version (if applicable): 3.6.12 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.6.0 Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.) onnx model code
Steps To Reproduce
Please include:
onnx model
andcode
to diskERROR: