SDXL failure of TensorRT 10.2 when running SDXL & INT8 quantization on GPU A100

Ijustakid commented 2 months ago

Description

When I use your demo/Diffusion/demo_txt2img_xl.py for INT8 datatype inference, it reports an error:

Invoked with: %338 : Tensor = onnx::Constant(), scope: transformers.models.clip.modeling_clip.CLIPTextModel::/transformers.models.clip.modeling_clip.CLIPTextTransformer::text_model/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn , 'value', 0.125 (Occurred when translating scaled_dot_product_attention).

Environment

TensorRT Version: 10.2

NVIDIA GPU: A100

NVIDIA Driver Version: 535.161.08

CUDA Version: 12.2

CUDNN Version: --

Operating System: ubuntu 20.04

Python Version (if applicable): 3.8.19

Tensorflow Version (if applicable): 2.12.0

PyTorch Version (if applicable): 2.3.1

Baremetal or Container (if so, version):

Relevant Files

cmd: python3 demo_txt2img_xl.py "a photo of an astronaut riding a horse on mars" --version xl-1.0 --onnx-dir onnx-sdxl --engine-dir engine-sdxl --int

2024-08-19 17:19:34.445825: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-08-19 17:19:35.351517: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-08-19 17:19:40.233890: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [I] Initializing TensorRT accelerated StableDiffusionXL txt2img pipeline [I] Autoselected scheduler: Euler [I] Load CLIPTokenizer model from: pytorch_model/xl-1.0/XL_BASE/tokenizer [I] Load CLIPTokenizer model from: pytorch_model/xl-1.0/XL_BASE/tokenizer_2 [I] Exporting ONNX model: onnx-sdxl/clip/model.onnx [I] Load CLIPTextModel model from: pytorch_model/xl-1.0/XL_BASE/text_encoder /sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py:1547: OnnxExporterWarning: Exporting to ONNX opset version 19 is not supported. by 'torch.onnx.export()'. The highest opset version supported is 17. To use a newer opset version, consider 'torch.onnx.dynamo_export()'. Note that dynamo_export() is in preview. Please report errors with dynamo_export() as Github issues to https://github.com/pytorch/pytorch/issues. warnings.warn( /sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: Traceback (most recent call last): File "demo_txt2img_xl.py", line 135, in demo.loadEngines( File "demo_txt2img_xl.py", line 59, in loadEngines self.base.loadEngines(engine_dir, framework_model_dir, onnx_dir, kwargs) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/stable_diffusion_pipeline.py", line 457, in loadEngines obj.export_onnx(onnx_path[model_name], onnx_opt_path[model_name], onnx_opset, opt_image_height, opt_image_width, enable_lora_merge=do_lora_merge[model_name], static_shape=static_shape) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/models.py", line 413, in export_onnx export_onnx(self.get_model()) File "/sfs_cv/yhy3/project/tensorrt/TensorRT/demo/Diffusion/models.py", line 398, in export_onnx torch.onnx.export(model, File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py", line 516, in export _export( File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py", line 1612, in _export graph, params_dict, torch_out = _model_to_graph( File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py", line 1138, in _model_to_graph graph = _optimize_graph( File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py", line 677, in _optimize_graph graph = _C._jit_pass_onnx(graph, operator_export_type) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function return symbolic_fn(graph_context, *inputs, *attrs) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper return fn(g, args, kwargs) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention query_scaled = g.op("Mul", query, g.op("Sqrt", scale)) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 87, in op return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 238, in inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor return _add_op(graph_context, "onnx::Constant", value_z=arg) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op node = _create_node( File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node _add_attribute(node, key, value, aten=aten) File "/sfs_cv/yhy3/conda/envs/trt_sd/lib/python3.8/site-packages/torch/onnx/_internal/jit_utils.py", line 356, in _addattribute return getattr(node, f"{kind}")(name, value) TypeError: z_(): incompatible function arguments. The following argument types are supported:

(self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Invoked with: %338 : Tensor = onnx::Constant(), scope: transformers.models.clip.modeling_clip.CLIPTextModel::/transformers.models.clip.modeling_clip.CLIPTextTransformer::text_model/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn , 'value', 0.125 (Occurred when translating scaled_dot_product_attention).

Please help me check it out. Thanks.

akhilg-nv commented 2 months ago

Which container are you using to run the demo? It appears that the error is occurring in the ONNX export process, and it's possible this happens due to the package versions in your environment as I'm seeing some warnings for packages that I have not seen before with the demo (e.g. tensorflow warning about enabling oneDNN custom operations).

Could you try running with latest TRT version with the container suggested in the demo README, and let us know if you are still running into an issue? If there is some customization you are trying to apply, please share additional information so we can reproduce the issue.

sergesg commented 1 month ago

maybe you could check the versions of tokenizers and transformers. follow README

NVIDIA / TensorRT