TRT compilation failure of TensorRT 8.6 when running quantized resnet18 on GPU A4000

YixuanSeanZhou commented 1 month ago

Description

I was trying to use TRT modelopt library to quantize a resnet18 from pytorch. The code to reproduce is:

from torchvision import models
from torch import nn, optim

# Define the models
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = nn.Linear(resnet18.fc.in_features, 10)

resnet18.to('cpu')

def forward_loop(model):
    for images, labels in tqdm(testloader):
        model(images)

config = deepcopy(mtq.INT8_SMOOTHQUANT_CFG)

resnet18 = mtq.quantize(resnet18, config, forward_loop=forward_loop)

torch.onnx.export(resnet18, torch.randn(1, 3, 32, 32), os.path.join(quantized_dir, "saved_model.onnx"), verbose=True, input_names=["input"], output_names=["output"])

I then run polygraphy constant folding

python ~/trt_model_opt/bin/polygraphy surgeon sanitize --fold-constants saved_model.onnx  -o saved_model.onnx

Then when i compile it with TRT, i got the following

[07/12/2024-00:08:34] [TRT] [V] --------------- Timing Runner: /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007])
[07/12/2024-00:08:34] [TRT] [V] Skipping tactic 0x0000000000000000 due to exception Assertion numScales == mGlobRefs.scale.count() failed. 
[07/12/2024-00:08:34] [TRT] [V] /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007]) profiling completed in 0.011867 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/12/2024-00:08:34] [TRT] [V] *************** Autotuning format combination: Int8(128,64:32,8,1) -> Float(4096,64,8,1) ***************
[07/12/2024-00:08:34] [TRT] [V] --------------- Timing Runner: /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007])
[07/12/2024-00:08:34] [TRT] [V] Skipping tactic 0x0000000000000000 due to exception Assertion numScales == mGlobRefs.scale.count() failed. 
[07/12/2024-00:08:34] [TRT] [V] /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1 (Scale[0x80000007]) profiling completed in 0.0113962 seconds. Fastest Tactic: 0xd15ea5edd15ea5ed Time: inf
[07/12/2024-00:08:34] [TRT] [E] 10: Could not find any implementation for node /maxpool/input_quantizer/DequantizeLinear_%10_cpy_clone_1.

Could you please take a look?

Environment

TensorRT Version: 8.6

Relevant Files

Model link: https://file.io/1IiicEUK92IM

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?: Didn't try TRT 10

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): yes

lix19937 commented 1 month ago

It seems has error exception Assertion numScales == mGlobRefs.scale.count() failed. Can you use the latest version of trt ?

YixuanSeanZhou commented 1 month ago

Got it, i will give TRT10 a try

NVIDIA / TensorRT