SDXL int8 Failure of TensorRT 10 when running txt2img_xl.py on GPU A100

13301338176 commented 4 months ago

Docker: nvcr.io/nvidia/pytorch:24.06-py3 pip uninstall nvidia-modelopt pip install nvidia-modelopt==0.13.0

command: python demo_txt2img_xl.py "enchanted winter forest, soft diffuse light on a snow-filled day, serene nature scene, the forest is illuminated by the snow" --negative-prompt "normal quality, low quality, worst quality, low res, blurry, nsfw, nude" --version xl-1.0 --scheduler Euler --denoising-steps 30 --seed 2946901 --int8 --quantization-level 3

Errors:

Warning: only per-channel smoothing is supported, skip down_blocks.0.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.1.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.1.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.2.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip down_blocks.2.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.0.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.1.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.1.time_emb_proj Warning: only per-channel smoothing is supported, skip up_blocks.2.resnets.2.time_emb_proj Warning: only per-channel smoothing is supported, skip mid_block.resnets.0.time_emb_proj Warning: only per-channel smoothing is supported, skip mid_block.resnets.1.time_emb_proj Smoothed 722 modules [I] Generating quantized ONNX model: onnx-sdxl/unetxl-int8.l3.0.bs2.s30.c32.p1.0.a0.8.opt/model.onnx [I] Load UNet pytorch model from: pytorch_model/xl-1.0/XL_BASE/unet/diffusion_pytorch_model.fp16.safetensors Inserted 2942 quantizers Traceback (most recent call last): File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/demo_txt2img_xl.py", line 135, in demo.loadEngines( File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/demo_txt2img_xl.py", line 59, in loadEngines self.base.loadEngines(engine_dir, framework_model_dir, onnx_dir, **kwargs) File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 501, in loadEngines quantize_lvl(model, quantization_level) File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/utils_modelopt.py", line 135, in quantize_lvl module.bmm2_output_quantizer.disable() File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/dynamic.py", line 810, in getattr attr = super().getattr(name) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1708, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'QuantAttention' object has no attribute 'bmm2_output_quantizer'

lix19937 commented 4 months ago

File "/local/mnt/workspace/haijunz/TensorRT/demo/Diffusion/utils_modelopt.py", line 135, in quantize_lvl

13301338176 commented 4 months ago

not understand , no solution ? we don't modify any line in code and just run it .

631068264 commented 4 months ago

Same error
@13301338176 maybe you can check if nvidia-modelopt version is 0.11.2 ，and you should rm onnx int8 dir if you downgrade the package

NVIDIA / TensorRT

SDXL int8 Failure of TensorRT 10 when running txt2img_xl.py on GPU A100 #3984