NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.31k stars 932 forks source link

chatGLM3-6B Build TensorRT engine(s) error #1093

Open wohushihaoren opened 7 months ago

wohushihaoren commented 7 months ago

System Info

Traceback (most recent call last):
File "/home/powerop/.conda/envs/bamboo/bin/trtllm-build", line 8, in
sys.exit(main())
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 489, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 413, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 385, in build_and_save
engine = build(build_config,
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 276, in build
return build_model(model, build_config)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 193, in build_model
model(inputs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(args, kwargs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 498, in forward
hidden_states = self.transformer.forward(
kwargs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/models/chatglm/model.py", line 253, in forward
layer_output = layer(
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(
args,
kwargs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/models/chatglm/model.py", line 117, in forward
attention_output = self.attention(
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(*args, *kwargs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/attention.py", line 648, in forward
qkv = self.qkv(hidden_states, qkv_lora_params)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/module.py", line 40, in call
output = self.forward(
args, **kwargs)
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 139, in forward
return self.multiply_gather(x,
File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 115, in multiply_gather x = _gemm_plugin(x, File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 59, in _gemm_plugin layer = default_trtnet().add_plugin_v2(plug_inputs, gemm_plug) TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported:

  1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, inputs: List[tensorrt_bindings.tensorrt.ITensor], plugin: tensorrt_bindings.tens orrt.IPluginV2) -> tensorrt_bindings.tensorrt.IPluginV2Layer

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f99758bf530>, [<tensorrt_bindings.tensorrt.ITensor object at 0x7f99 75a77db0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f99758bf870>], None

Who can help?

No response

Information

Tasks

Reproduction

follow chatglm example instructions to build and then run

Expected behavior

build chatglm3-6b engine successfully

actual behavior

build engines failed, unknown error

additional notes

i have run python3 convert_checkpoint.py --model_dir chatglm3_6b --output_dir trt_ckpt/chatglm3_6b/fp16/1-gpu successfully

however, when i running

# ChatGLM3-6B: single-gpu engine
trtllm-build --checkpoint_dir trt_ckpt/chatglm3_6b/fp16/1-gpu \
        --gemm_plugin float16 \
        --output_dir trt_engines/chatglm3_6b/fp16/1-gpu

an error occured, like

  File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 139, in forward                   
    return self.multiply_gather(x,                                                                                                           
  File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 115, in multiply_gather
    x = _gemm_plugin(x,
  File "/home/powerop/.conda/envs/bamboo/lib/python3.10/site-packages/tensorrt_llm/layers/linear.py", line 59, in _gemm_plugin
    layer = default_trtnet().add_plugin_v2(plug_inputs, gemm_plug)
TypeError: add_plugin_v2(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, inputs: List[tensorrt_bindings.tensorrt.ITensor], plugin: tensorrt_bindings.tens
orrt.IPluginV2) -> tensorrt_bindings.tensorrt.IPluginV2Layer

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f99758bf530>, [<tensorrt_bindings.tensorrt.ITensor object at 0x7f99
75a77db0>, <tensorrt_bindings.tensorrt.ITensor object at 0x7f99758bf870>], None

please help me, thanks a lot

zmy1116 commented 7 months ago

I was following the script to convert gemma and see the same error with tensort_llm version 0.9.0.dev2024022000.

jzymessi commented 6 months ago

I build chatglm-6b see the same error with TensorRT-LLM version: 0.9.0.dev2024022700

QiJune commented 6 months ago

@syuoni Could you please take a look? Thanks

syuoni commented 6 months ago

Hi @wohushihaoren , I tried to reproduce the issue with your commands on the current TensorRT-LLM main, but everything went well on my side. Could you please try with the current main branch?

According to the error information, the gemm_plug is None (See https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/layers/linear.py#L63). It seems that something breaks in the package building process. Did you install TensorRT-LLM with pip install or “build from source” (https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md)? If it was the latter, I think you may try to re-build first.

Also, the error raises at gemm_plugin creation, could you try without gemm_plugin for engine building? i.e.,

trtllm-build --checkpoint_dir trt_ckpt/chatglm3_6b/fp16/1-gpu \
        --output_dir trt_engines/chatglm3_6b/fp16/1-gpu
QiJune commented 6 months ago

Please try to install the requirements.txt in the chatglm example first:

pip install -r examples/chatglm/requirements.txt
yz-tang commented 3 months ago

I had the same problem when I use trtllm-build for Baichuan-13B-Chat