NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
403 stars 24 forks source link

SDXL int8 Failure of TensorRT 10 when running txt2img_xl.py on GPU A100 #37

Closed 13301338176 closed 1 month ago

13301338176 commented 1 month ago

Docker: nvcr.io/nvidia/pytorch:24.06-py3 pip uninstall nvidia-modelopt pip install nvidia-modelopt==0.13.0

command: python demo_txt2img_xl.py "enchanted winter forest, soft diffuse light on a snow-filled day, serene nature scene, the forest is illuminated by the snow" --negative-prompt "normal quality, low quality, worst quality, low res, blurry, nsfw, nude" --version xl-1.0 --scheduler Euler --denoising-steps 30 --seed 2946901 --int8 --quantization-level 3

Errors: Warning: only per-channel smoothing is supported, skip down_blocks.0.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.1.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.1.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.2.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.2.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip mid_block.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip mid_block.resnets.1.time_emb_proj Smoothed 722 modules [I] Generating quantized ONNX model: onnx-sdxl/unetxl-int8.l3.0.bs2.s30.c32.p1.0.a0.8.opt/model.onnx [I] Load UNet pytorch model from: pytorch_model/xl-1.0/XL_BASE/unet/diffusion_pytorch_model.fp16.safetensors Inserted 2942 quantizers Traceback (most recent call last): File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/demo_txt2img_xl.py", line 135, in demo.loadEngines( File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/demo_txt2img_xl.py", line 59, in loadEngines self.base.loadEngines(engine_dir, framework_model_dir, onnx_dir, **kwargs) File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 501, in loadEngines quantize_lvl(model, quantization_level) File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/utils_modelopt.py", line 135, in quantize_lvl module.bmm2_output_quantizer.disable() File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/dynamic.py", line 810, in getattr attr = super().getattr(name) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1708, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'QuantAttention' object has no attribute 'bmm2_output_quantizer'

Edwardf0t1 commented 1 month ago

Thanks for reporting the issue.

To run SDXL int8 pipeline, please follow steps in the README.

13301338176 commented 1 month ago

because follow the "README" reported this error , so we raise this issue. why push back ?

Edwardf0t1 commented 1 month ago

The command line below you used is not in the README, that's why we suggested to follow steps there.

python demo_txt2img_xl.py "enchanted winter forest, soft diffuse light on a snow-filled day, serene nature scene, the forest is illuminated by the snow" --negative-prompt "normal quality, low quality, worst quality, low res, blurry, nsfw, nude" --version xl-1.0 --scheduler Euler --denoising-steps 30 --seed 2946901 --int8 --quantization-level 3

If you prefer to run pipeline in this repo instead, try downgrade modelopt version to 0.11.0

Edwardf0t1 commented 1 month ago

Close the issue as no further response from the user.