Open enamulhoque1 opened 3 weeks ago
model_type
specified is incorrect. Changing it to --model_type llava_next
should work.
Unrelated but you seem to be using older 0.10 version. Suggest installing latest package from https://nvidia.github.io/TensorRT-LLM/installation/linux.html
System Info
-GPU A100 NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2
NVIDIA A100-SXM4-80GB
Who can help?
@byshiue @kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#llava-llava-next-and-vila
python /opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py \ --model_type llava \ --model_path /models/${MODEL_NAME} \ --max_batch_size 5 --output_dir /models/llava/vit_trt_engines/${MODEL_NAME}
Expected behavior
vit_trt_engines should be generated
actual behavior
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 3, 3, 336, 336]
Full logs below:
[TensorRT-LLM] TensorRT-LLM version: 0.10.0 /usr/local/lib/python3.10/dist-packages/transformers/models/llava/configuration_llava.py:100: FutureWarning: The
builder.build()
File "/opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py", line 79, in build
build_llava_engine(args)
File "/opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py", line 309, in build_llava_engine
export_visual_wrapper_onnx(wrapper, image, args.output_dir)
File "/opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py", line 106, in export_visual_wrapper_onnx
torch.onnx.export(visual_wrapper,
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 516, in export
_export(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1613, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1135, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 1011, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py", line 915, in _trace_and_get_graph_from_model
trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 1296, in _get_trace_graph
outs = ONNXTracedModule(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 138, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 129, in wrapper
outs.append(self.inner(trace_inputs))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, kwargs)
File "/opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py", line 298, in forward
all_hidden_states = self.tower(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 926, in forward
return self.vision_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 850, in forward
hidden_states = self.embeddings(pixel_values)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py", line 185, in forward
patch_embeds = self.patch_embedding(pixel_values.to(dtype=target_dtype)) # shape = [, width, grid, grid]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 3, 3, 336, 336]
vocab_size
argument is deprecated and will be removed in v4.42, since it can be inferred from thetext_config
. Passing this argument has no effect warnings.warn( Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:08<00:00, 1.84it/s] [08/23/2024-20:51:14] [TRT] [I] Exporting onnx Traceback (most recent call last): File "/opt/tensorrtllm_backend/tensorrt_llm/examples/multimodal/build_visual_engine.py", line 576, inadditional notes
None