trtllm-build with --fast-build ignore transformer layers

ZJLi2013 commented 1 month ago

System Info

DGX H100

Who can help?

when build engine with :

  trtllm-build --fast-build --model_config $model_cfg

and then benchmark with gptMangerBenchmark, it reports:

[08/21/2024-12:18:48] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:01:28
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.attention.qkv.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.attention.dense.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.router.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.fc.weight
[08/21/2024-12:18:48] [TRT-LLM] [E] Failed to get weight: transformer.layers.0.mlp.proj.weight
# and further in runtime : 
[TensorRT-LLM][ERROR] Encountered an error in forwardAsync function: Input tensor 'transformer.layers.0.attention.qkv.weight' not found; expected shape: (8192, 1280) (/src/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:202)

is it expected behavior with fast-build ?

btw, wo --fast-build, the engine build and benchmark looks all right.

Thanks

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

  trtllm-build --fast-build --model_config $model_cfg

Expected behavior

fast-build flag should also build workable engines,

actual behavior

with fast-build flag, transformer layers are ignored somehow

additional notes

no more

VALLIS-NERIA commented 3 weeks ago

Hi, please check:

Do you find a file named rank0_managed_weights.safetensors or so inside the engine dir?

Is there a field named manage_weights in config.json, plugin_config part?

VALLIS-NERIA commented 2 weeks ago

It seems that you are building from a model config without weights, not a checkpoint. In such cases TRT-LLM generates random weights, but is not supported by fast_build yet.

NVIDIA / TensorRT-LLM