NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

SDXL failure of TensorRT 10.2 when running SDXL with lora on GPU A100 #4090

Open Ijustakid opened 4 weeks ago

Ijustakid commented 4 weeks ago

Description

I manually went to hf and downloaded sdxl-base-1.0 and lora (crayon_style_lora_sdxl and watercolor_style_lora_sdxl)

python3 demo_txt2img_xl.py "Picture of a rustic Italian village with Olive trees and mountains" --version=xl-1.0 --lora-path "./crayon_style_lora_sdxl" "./watercolor_style_lora_sdxl" --lora-scale 0.3 0.7 --onnx-dir onnx-sdxl-lora --engine-dir engine-sdxl-lora --build-enable-refit But I encountered an error in my inference: Traceback (most recent call last): File "demo_txt2img_xl.py", line 135, in demo.loadEngines( File "demo_txt2img_xl.py", line 59, in loadEngines self.base.loadEngines(engine_dir, framework_model_dir, onnx_dir, **kwargs) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 561, in loadEngines refit_weights = get_refit_weights(model.state_dict(), onnx_opt_path[model_name], weights_name_mapping, weights_shape_mapping) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/utilities.py", line 540, in get_refit_weights initializer_name = weight_name_mapping[wt_name] KeyError: 'down_blocks.1.attentions.0.proj_in.base_layer.weight'

When I manually write the pipeline to load the model, it is able to inference properly and the lora has the right accuracy. pipeline: from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained("./stable-diffusion-xl-base-1.0") pipe.load_lora_weights("./crayon_style_lora_sdxl", adapter_name="crayon") pipe.load_lora_weights("./watercolor_style_lora_sdxl", adapter_name="watercolor") pipe.set_adapters(["crayon", "watercolor"], adapter_weights=[1.0, 0.1]) pipe.to("cuda") prompt = "An astronaut riding a green horse" images = pipe(prompt=prompt).images[0] images.save("./output_image.png")

Environment

TensorRT Version: 10.2

NVIDIA GPU: A100

NVIDIA Driver Version: 535.161.08

CUDA Version: 12.2

CUDNN Version: ---

Operating System:

Python Version (if applicable): 3.8.19

Tensorflow Version (if applicable): 2.12.0

PyTorch Version (if applicable): 2.3.1

Baremetal or Container (if so, version):

Relevant Files

python3 demo_txt2img_xl.py "Picture of a rustic Italian village with Olive trees and mountains" --version=xl-1.0 --lora-path "./crayon_style_lora_sdxl" "./watercolor_style_lora_sdxl" --lora-scale 0.3 0.7 --onnx-dir onnx-sdxl-lora --engine-dir engine-sdxl-lora --build-enable-refit [I] Initializing TensorRT accelerated StableDiffusionXL txt2img pipeline [I] Autoselected scheduler: Euler [I] Load CLIPTokenizer model from: pytorch_model/xl-1.0/XL_BASE/tokenizer [I] Load CLIPTokenizer model from: pytorch_model/xl-1.0/XL_BASE/tokenizer_2 Processing unet LoRA: ./crayon_style_lora_sdxl Processing unet LoRA: ./watercolor_style_lora_sdxl Loading TensorRT engine: engine-sdxl-lora/clip.trt10.2.0.plan [I] Loading bytes from engine-sdxl-lora/clip.trt10.2.0.plan Loading TensorRT engine: engine-sdxl-lora/clip2.trt10.2.0.plan [I] Loading bytes from engine-sdxl-lora/clip2.trt10.2.0.plan Loading TensorRT engine: engine-sdxl-lora/unetxl.refit.trt10.2.0.plan [I] Loading bytes from engine-sdxl-lora/unetxl.refit.trt10.2.0.plan [I] Loading weights map: onnx-sdxl-lora/unetxl.opt/weights_map.json [I] Saving refit weights: engine-sdxl-lora/unetxl.opt/refit-87f04396213ce44b3a2ac69ca31ba887-0.30-1fccad9db8bbe447ba29de2acfd3f068-0.70.json [I] Fusing LoRA: ./crayon_style_lora_sdxl, scale 0.3 [I] Fusing LoRA: ./watercolor_style_lora_sdxl, scale 0.7 /sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/peft/tuners/lora/model.py:416: UserWarning: Adapter cannot be set when the model is merged. Unmerging the model first. warnings.warn("Adapter cannot be set when the model is merged. Unmerging the model first.") /sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/utilities.py:533: RuntimeWarning: overflow encountered in cast initializer_data = numpy_helper.to_array(initializer, base_dir=onnx_opt_dir).astype(np.float16) Traceback (most recent call last): File "demo_txt2img_xl.py", line 135, in demo.loadEngines( File "demo_txt2img_xl.py", line 59, in loadEngines self.base.loadEngines(engine_dir, framework_model_dir, onnx_dir, **kwargs) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 561, in loadEngines refit_weights = get_refit_weights(model.state_dict(), onnx_opt_path[model_name], weights_name_mapping, weights_shape_mapping) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/utilities.py", line 540, in get_refit_weights initializer_name = weight_name_mapping[wt_name] KeyError: 'down_blocks.1.attentions.0.proj_in.base_layer.weight'

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

akhilg-nv commented 2 weeks ago

Similarly to the other question, could you provide full information on how to reproduce this issue? Which container / environment details will be important so we can look into the error. Does the sample work if you omit the --build-enable-refit flag?