NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.18k stars 908 forks source link

Error trying to build the visual encoder for llava-v1.6-34b-hf using build_visual_engine.py #2100

Open alexemme opened 1 month ago

alexemme commented 1 month ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

Expected behavior

conversion of the visual encoder in .engine format.

actual behavior

Received an error:

[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.09it/s] [08/08/2024-10:27:28] [TRT] [I] Exporting onnx to tmp/trt_engines/llava-v1.6-34b-hf/vision_encoder/onnx/model.onnx Traceback (most recent call last): File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 817, in builder.build() File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 84, in build build_llava_engine(args) File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 374, in build_llava_engine export_onnx(wrapper, image, f'{args.output_dir}/onnx') File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 118, in export_onnx torch.onnx.export(model, File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export _export( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1612, in _export graph, params_dict, torch_out = _model_to_graph( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1138, in _model_to_graph graph = _optimize_graph( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 677, in _optimize_graph graph = _C._jit_pass_onnx(graph, operator_export_type) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function return symbolic_fn(graph_context, *inputs, attrs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper return fn(g, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention query_scaled = g.op("Mul", query, g.op("Sqrt", scale)) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 87, in op return _add_op(self, opname, raw_args, outputs=outputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor return _add_op(graph_context, "onnx::Constant", value_z=arg) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op node = _create_node( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node _add_attribute(node, key, value, aten=aten) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 356, in _addattribute return getattr(node, f"{kind}")(name, value) TypeError: z_(): incompatible function arguments. The following argument types are supported:

  1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Invoked with: %482 : Tensor = onnx::Constant(), scope: main.build_llava_engine..LlavaNextVisionWrapper::/transformers.models.clip.modeling_clip.CLIPVisionTransformer::vision_tower/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn , 'value', 0.125 (Occurred when translating scaled_dot_product_attention).

additional notes

The environment was set up exactly as described in the documentation, using a Docker container with Ubuntu (https://nvidia.github.io/TensorRT-LLM/installation/linux.html).

The build of the LLM part, completed successfully without any issues.

I suspect the problem may be due to a mismatch of the libraries installed by the TensorRT-LLM installation procedure. I have no idea how to resolve this issue. Is there anyone who could provide some guidance on where to start? Thank you.

lzyrapx commented 1 month ago

I'm facing the same issues. Any updates on this @alexemme ? Thank you.

lzyrapx commented 1 month ago

@pathorn @ttim @Superjomn can somebody help me please?

alexemme commented 1 month ago

I'm facing the same issues. Any updates on this @alexemme ? Thank you.

No updates

amukkara commented 1 week ago

@alexemme Can you try this on latest preview package?

Tested this on latest package and engine build runs without error for both llava-v1.6-mistral-7b-hf and llava-v1.6-34b-hf

An earlier version might have had a package mismatch.