NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 990 forks source link

build failed for bloom 176b on v0.6.1 branch #688

Open NaNAGISaSA opened 11 months ago

NaNAGISaSA commented 11 months ago

I try to use real and dummy weights to build bloom 176b model, but all failed with Integer Overflow.

build command:

# real weights
python build.py --world_size 8 \
                --model_dir /workspace/volume/llm_models/bloom/bloom \
                --dtype float16 \
                --max_batch_size 8 \
                --max_input_len 2048 \
                --max_output_len 2048 \
                --enable_context_fmha_fp32_acc \
                --use_gemm_plugin float16 \
                --use_gpt_attention_plugin float16 \
                --use_layernorm_plugin float16 \
                --output_dir /workspace/volume/trt_models/bloom_176b/trt_engines/fp16/8-gpu/

# fake weights
python build.py --world_size 8 \
                --dtype float16 \
                --vocab_size 250880 \
                --n_layer 70 \
                --n_embd 14336 \
                --n_head 112 \
                --max_batch_size 8 \
                --max_input_len 2048 \
                --max_output_len 2048 \
                --enable_context_fmha_fp32_acc \
                --use_gemm_plugin float16 \
                --use_gpt_attention_plugin float16 \
                --use_layernorm_plugin float16 \
                --output_dir /workspace/volume/trt_models/bloom_176b/trt_engines/fp16/8-gpu/

error message:

[11/23/2023-08:59:43] [TRT-LLM] [I] Build TensorRT engine bloom_float16_tp8_rank0.engine
[11/23/2023-08:59:43] [TRT] [W] Unused Input: position_ids
[11/23/2023-08:59:44] [TRT] [E] 8: [helpers.h::numericCast::52] Error Code 8: Integer Overflow (cast failure due to overflow with value 3596615680 and upper bound 2147483647)
[11/23/2023-08:59:44] [TRT-LLM] [E] Engine building failed, please check the error log.
jdemouth-nvidia commented 11 months ago

Can you try with a more recent branch, please? We develop mostly in the main branch and our latest release branch is rel.

NaNAGISaSA commented 11 months ago

hello @jdemouth-nvidia, I have tested on the v0.6.1 branch, the same error occurred:

[12/21/2023-02:42:10] [TRT-LLM] [I] Build TensorRT engine bloom_float16_tp8_rank0.engine
[12/21/2023-02:42:10] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[12/21/2023-02:42:10] [TRT] [W] Unused Input: position_ids
[12/21/2023-02:42:11] [TRT] [E] 8: [helpers.h::numericCast::52] Error Code 8: Integer Overflow (cast failure due to overflow with value 3596615680 and upper bound 2147483647)
[12/21/2023-02:42:11] [TRT-LLM] [E] Engine building failed, please check the error log.