Error trying to build the visual encoder for llava-v1.6-34b-hf using build_visual_engine.py

alexemme commented 1 month ago

System Info

2x H100 80GB on docker container (nvidia/cuda:12.4.1-devel-ubuntu22.04)
last version of the library

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Installation of tensorrt_llm as described in https://nvidia.github.io/TensorRT-LLM/installation/linux.html
download of llava-v1.6-34b-hf model as described in https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/multimodal/README.md
run the script python3 build_visual_engine.py --model_path tmp/hf_models/${MODEL_NAME} --model_type llava_next --model_path tmp/hf_models/${MODEL_NAME} --max_batch_size 5

Expected behavior

conversion of the visual encoder in .engine format.

actual behavior

Received an error:

[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.09it/s] [08/08/2024-10:27:28] [TRT] [I] Exporting onnx to tmp/trt_engines/llava-v1.6-34b-hf/vision_encoder/onnx/model.onnx Traceback (most recent call last): File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 817, in builder.build() File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 84, in build build_llava_engine(args) File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 374, in build_llava_engine export_onnx(wrapper, image, f'{args.output_dir}/onnx') File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 118, in export_onnx torch.onnx.export(model, File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export _export( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1612, in _export graph, params_dict, torch_out = _model_to_graph( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1138, in _model_to_graph graph = _optimize_graph( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 677, in _optimize_graph graph = _C._jit_pass_onnx(graph, operator_export_type) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function return symbolic_fn(graph_context, *inputs, attrs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper return fn(g, *args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention query_scaled = g.op("Mul", query, g.op("Sqrt", scale)) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 87, in op return _add_op(self, opname, raw_args, outputs=outputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in inputs = [_const_if_tensor(graph_context, arg) for arg in args] File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor return _add_op(graph_context, "onnx::Constant", value_z=arg) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op node = _create_node( File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node _add_attribute(node, key, value, aten=aten) File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 356, in _addattribute return getattr(node, f"{kind}")(name, value) TypeError: z_(): incompatible function arguments. The following argument types are supported:

(self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Invoked with: %482 : Tensor = onnx::Constant(), scope: main.build_llava_engine..LlavaNextVisionWrapper::/transformers.models.clip.modeling_clip.CLIPVisionTransformer::vision_tower/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn , 'value', 0.125 (Occurred when translating scaled_dot_product_attention).

additional notes

The environment was set up exactly as described in the documentation, using a Docker container with Ubuntu (https://nvidia.github.io/TensorRT-LLM/installation/linux.html).

The build of the LLM part, completed successfully without any issues.

I suspect the problem may be due to a mismatch of the libraries installed by the TensorRT-LLM installation procedure. I have no idea how to resolve this issue. Is there anyone who could provide some guidance on where to start? Thank you.

lzyrapx commented 1 month ago

I'm facing the same issues. Any updates on this @alexemme ？ Thank you.

lzyrapx commented 1 month ago

@pathorn @ttim @Superjomn can somebody help me please?

alexemme commented 1 month ago

I'm facing the same issues. Any updates on this @alexemme ？ Thank you.

No updates

amukkara commented 1 week ago

@alexemme Can you try this on latest preview package?

Tested this on latest package and engine build runs without error for both llava-v1.6-mistral-7b-hf and llava-v1.6-34b-hf

An earlier version might have had a package mismatch.

NVIDIA / TensorRT-LLM