NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.27k stars 918 forks source link

trtllm-build with --fast-build ignore transformer layers #2135

Open ZJLi2013 opened 1 month ago

ZJLi2013 commented 1 month ago

System Info

DGX H100

Who can help?

when build engine with :

  trtllm-build --fast-build --model_config $model_cfg 

and then benchmark with gptMangerBenchmark, it reports:

[08/21/2024-12:18:48] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:01:28
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.attention.qkv.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.attention.dense.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.router.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.fc.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.proj.weight
# and further in runtime : 
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Input tensor 'transformer.layers.0.attention.qkv.weight' not found; expected shape: (8192, 1280) (/src/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:202)

is it expected behavior with fast-build ?

btw, wo --fast-build, the engine build and benchmark looks all right.

Thanks

Information

Tasks

Reproduction

  trtllm-build --fast-build --model_config $model_cfg 

Expected behavior

fast-build flag should also build workable engines,

actual behavior

with fast-build flag, transformer layers are ignored somehow

additional notes

no more

VALLIS-NERIA commented 3 weeks ago

Hi, please check:

Do you find a file named rank0_managed_weights.safetensors or so inside the engine dir?

Is there a field named manage_weights in config.json, plugin_config part?

VALLIS-NERIA commented 2 weeks ago

It seems that you are building from a model config without weights, not a checkpoint. In such cases TRT-LLM generates random weights, but is not supported by fast_build yet.