Open alexemme opened 1 month ago
I'm facing the same issues. Any updates on this @alexemme ? Thank you.
@pathorn @ttim @Superjomn can somebody help me please?
I'm facing the same issues. Any updates on this @alexemme ? Thank you.
No updates
@alexemme Can you try this on latest preview package?
Tested this on latest package and engine build runs without error for both llava-v1.6-mistral-7b-hf
and llava-v1.6-34b-hf
An earlier version might have had a package mismatch.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
conversion of the visual encoder in .engine format.
actual behavior
Received an error:
[TensorRT-LLM] TensorRT-LLM version: 0.12.0.dev2024080600 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:01<00:00, 11.09it/s] [08/08/2024-10:27:28] [TRT] [I] Exporting onnx to tmp/trt_engines/llava-v1.6-34b-hf/vision_encoder/onnx/model.onnx Traceback (most recent call last): File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 817, in
builder.build()
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 84, in build
build_llava_engine(args)
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 374, in build_llava_engine
export_onnx(wrapper, image, f'{args.output_dir}/onnx')
File "/workspace/test/TensorRT-LLM/examples/multimodal/build_visual_engine.py", line 118, in export_onnx
torch.onnx.export(model,
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1612, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1138, in _model_to_graph
graph = _optimize_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 677, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1956, in _run_symbolic_function
return symbolic_fn(graph_context, *inputs, attrs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_helper.py", line 306, in wrapper
return fn(g, *args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/symbolic_opset14.py", line 176, in scaled_dot_product_attention
query_scaled = g.op("Mul", query, g.op("Sqrt", scale))
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 87, in op
return _add_op(self, opname, raw_args, outputs=outputs, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in _add_op
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 238, in
inputs = [_const_if_tensor(graph_context, arg) for arg in args]
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 269, in _const_if_tensor
return _add_op(graph_context, "onnx::Constant", value_z=arg)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 246, in _add_op
node = _create_node(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 305, in _create_node
_add_attribute(node, key, value, aten=aten)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py", line 356, in _addattribute
return getattr(node, f"{kind}")(name, value)
TypeError: z_(): incompatible function arguments. The following argument types are supported:
Invoked with: %482 : Tensor = onnx::Constant(), scope: main.build_llava_engine..LlavaNextVisionWrapper::/transformers.models.clip.modeling_clip.CLIPVisionTransformer::vision_tower/transformers.models.clip.modeling_clip.CLIPEncoder::encoder/transformers.models.clip.modeling_clip.CLIPEncoderLayer::layers.0/transformers.models.clip.modeling_clip.CLIPSdpaAttention::self_attn
, 'value', 0.125
(Occurred when translating scaled_dot_product_attention).
additional notes
The environment was set up exactly as described in the documentation, using a Docker container with Ubuntu (https://nvidia.github.io/TensorRT-LLM/installation/linux.html).
The build of the LLM part, completed successfully without any issues.
I suspect the problem may be due to a mismatch of the libraries installed by the TensorRT-LLM installation procedure. I have no idea how to resolve this issue. Is there anyone who could provide some guidance on where to start? Thank you.