Closed junstar92 closed 2 months ago
How was this file(simple_conv_fp8.onnx) generated ?
@junstar92 You might have to add the --stronglyTyped
flag as well.
cc: @nvpohanh
Sorry, this is a bug in TRT 10.2. Please enable --stronglyTyped
for now.
We will try to fix this issue in TRT 10.3
@nvpohanh Thank you for checking this issue.
With --stronglyTyped
flags, FP8 tactic is enabled.
But, I have another question about FP8 convolution. I tried to build ResNet18 and ResNet50, but, TensorRT cannot find any implementation for the first conv operation of ResNet. It seems there are no convolution operation tactics which have 3 in-channels. Does TensorRT 10.2 support overall ResNet18 or ResNet50 ?
Here is the error log.
[07/12/2024-05:24:18] [V] [TRT] =============== Computing costs for {ForeignNode[/fake_quantizer_7c435f9c02917de57484db91f86bbbaf/QuantizeLinear.../fake_quantizer_5d22154b57438817000ba0a1ea6159ca/DequantizeLinear]}
[07/12/2024-05:24:18] [V] [TRT] *************** Autotuning format combination: Float(150528,50176,224,1) -> Float(802816,12544,112,1) ***************
[07/12/2024-05:24:18] [V] [TRT] --------------- Timing Runner: {ForeignNode[/fake_quantizer_7c435f9c02917de57484db91f86bbbaf/QuantizeLinear.../fake_quantizer_5d22154b57438817000ba0a1ea6159ca/DequantizeLinear]} (Myelin[0x80000023])
[07/12/2024-05:24:18] [V] [TRT] [MemUsageChange] Subgraph create: CPU +0, GPU +0, now: CPU 2390, GPU 844 (MiB)
[07/12/2024-05:24:18] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [autotuner.cpp:get_best_tactics:2061] Autotuner: no tactics to implement operation:
131: corrltn: /L__self___conv1/Conv_output_0'_before_bias.1-(f32[16,64,112,112][]so[], mem_prop=0) | /fake_quantizer_7c435f9c02917de57484db91f86bbbaf/QuantizeLinear_output_0'.1-(fp8[16,3,224,224][]so[], mem_prop=0), __mye150_dconst-{-2.75, -1.625, -0.46875, 20, 15, 4.5, -3.25, 3, ...}(fp8[64,3,7,7][147,49,7,1]so[3,2,1,0], mem_prop=0)<entry>, __mye126-1.10786e-05F:(f32[][]so[], mem_prop=0)<entry>, __mye91/L__self___conv1/Conv_beta-0F:(f32[][]so[], mem_prop=0)<entry>, stream = 0 // __mye130_conv
| n_groups: 1 lpad: {3, 3} rpad: {3, 3} pad_mode: 0 strides: {2, 2} dilations: {1, 1}
[07/12/2024-05:24:18] [V] [TRT] {ForeignNode[/fake_quantizer_7c435f9c02917de57484db91f86bbbaf/QuantizeLinear.../fake_quantizer_5d22154b57438817000ba0a1ea6159ca/DequantizeLinear]} (Myelin[0x80000023]) profiling completed in 0.118439 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/12/2024-05:24:18] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/fake_quantizer_7c435f9c02917de57484db91f86bbbaf/QuantizeLinear.../fake_quantizer_5d22154b57438817000ba0a1ea6159ca/DequantizeLinear]}.)
[07/12/2024-05:24:18] [E] Engine could not be created from network
[07/12/2024-05:24:18] [E] Building engine failed
[07/12/2024-05:24:18] [E] Failed to create engine from model or file.
[07/12/2024-05:24:18] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100200] # trtexec --onnx=ResNet18_batch16_fp8.onnx --fp16 --fp8 --stronglyTyped --verbose --profilingVerbosity=detailed
Did you insert the Q/DQ ops by using the TensorRT Model Optimizer toolkit? https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq
It should have avoided the Q/DQ ops before Convs whose C and K are not multiples of 16.
But thanks for pointing this out. I will add this limitation to our release notes.
Did you insert the Q/DQ ops by using the TensorRT Model Optimizer toolkit? https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq
It should have avoided the Q/DQ ops before Convs whose C and K are not multiples of 16.
@nvpohanh Thanks for quick answer.
I inserted Q/DQ ops by using modelopt
. But, in case of native Q/DQ ops, same error occurs.
in case of using modelopt
.
in case of native Q/DQ ops.
Filed an internal tracker: id 4744383
We will debug this and find out how this is different from our FP8 ResNet50 testing in our CI/CD.
This is my quantization and onnx-export code.
import torch
import torchvision
import modelopt.torch.quantization as mtq
FP8_DEFAULT_CFG = {
"quant_cfg": {
"*weight_quantizer": {"num_bits": (4, 3), "axis": None},
"*input_quantizer": {"num_bits": (4, 3), "axis": None},
"*output_quantizer": {"enable": False},
"*block_sparse_moe.gate*": {"enable": False}, # Skip the MOE router
"default": {"num_bits": (4, 3), "axis": None},
},
"algorithm": "max",
}
def calib_loop():
for _ in range(10):
model(torch.randn(16, 3, 224, 224, device='cuda'))
model = torchvision.models.resnet18(pretrained=True).cuda()
mtq.quantize(model, FP8_DEFAULT_CFG, forward_loop=calib_loop)
torch.onnx.export(
model,
torch.randn(16, 3, 224, 224, device='cuda'),
'resnet18_fp8.onnx',
input_names=['input'],
output_names=['output'],
)
@junstar92 Oh I see, you are using modelopt.torch.quantization
while I was referring to modelopt.onnx.quantization
. Could you first export the original model to ONNX and then use modelopt.onnx.quantization
to add Q/DQ nodes?
I will check internally about modelopt.torch.quantization
vs modelopt.onnx.quantization
differences.
@nvpohanh Okay, it is the quantized onnx by using modelopt.onnx.quantization
, and the first convolution is not quantized.
It succeeded to build this onnx model. It seems right that the first conv op is not implemented.
@nvpohanh My question has been resolved and I close this issue. I appreciate your support.
@junstar92 May you please show the version of ModelOpt and Pytorch?
I'm tring fp8 Resnet too. Thank you very much!!!
@lishicheng1996 PyTorch: 2.2 ModelOpt: 0.13
We will fix this in the TRT 10.4 release.
Hello,
I am using TensorRT 10.2 and noticed that the normal FP8 convolution has been updated. However, when I try to use a simple QDQ + Conv model in ONNX, the FP8 convolution is not selected. Even timing FP8 tactics is not performed.
Here is the model I used. It was quantized by using TensorRT-Model-Optimizer. And I used H100 device.
trtexec
command: