Quantized Inference failure of TensorRT 8.6 when running SDXL-turbo model on GPU A10

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.84k stars 2.14k forks source link

Quantized Inference failure of TensorRT 8.6 when running SDXL-turbo model on GPU A10 #3710

Open ApolloRay opened 8 months ago

ApolloRay commented 8 months ago

Description

Environment

TensorRT Version:8.6

NVIDIA GPU:A10

NVIDIA Driver Version:525.147.05

CUDA Version:12.0

CUDNN Version:8.9

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link: dreamshaper(turbo version)

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

ApolloRay commented 8 months ago

python demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --version xl-turbo --onnx-dir ./dreamshaper_model/dreamshaper_onnx/ --engine-dir engine-sdxl-turbo --height 1024 --width 1024 --int8

Description: run with this code in A10 (23G), it will show OOM.

ApolloRay commented 8 months ago

if i use width = 512 and height = 512 , it can run. But the inference time unet-int8 ~ 300ms , unet - fp16 ~ 250ms.

zerollzeng commented 8 months ago

@rajeevsrao @azhurkevich Is it expected? Thanks!

ApolloRay commented 8 months ago

same problem as #3724

ApolloRay commented 8 months ago

@rajeevsrao @azhurkevich Is it expected? Thanks!

Can anyone help?

azhurkevich commented 7 months ago

@ApolloRay maybe you can follow this blog post

ApolloRay commented 7 months ago

@ApolloRay maybe you can follow this blog post

from utils import load_calib_prompts I can't find any info about utils.