QWenForCausalLM/transformer/vocab_embedding/embedding/GATHER_O_output_0: tensor volume exceeds 2147483647, dimensions are [num tokens,8192]

zymy-chen commented 3 weeks ago

System Info

GPU Name: 8 * H20 TensorRT-LLM : 0.11.0 NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.4

Who can help?

No response

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)

Reproduction

qwen2-72B,batch_size=30,input_len =8192 ,output_len=512 It works fine when running FP16, but FP8 comes up tensor volume exceeds 2147483647

Expected behavior

actual behavior

trtllm-build --checkpoint_dir ./tllm_checkpoint_fp8--output_dir ./8-gpu/ --gemm_plugin float16 --max_batch_size 64 --max_input_len 256 --max_output_len 256 --remove_input_padding enable --gpt_attention_plugin float16 --context_fmha enable --workers 8 --tp_size 8 --paged_kv_cache enable --use_paged_context_fmha enable --use_fused_mlp

additional notes

None

jershi425 commented 1 week ago

Hi @zymy-chen, this is a known issue for TRT version less than v10.1. Could you please check what is your TRT version?

zymy-chen commented 1 week ago

你好@zymy-chen，这是 TRT 版本低于 v10.1 的已知问题。您能检查一下您的 TRT 版本吗？

TRT version is 10.2.0

lfr-0531 commented 1 week ago

Can you try to rerun using the latest version of TensorRT-LLM to see if this issue persists?

zymy-chen commented 3 days ago

Can you try to rerun using the latest version of TensorRT-LLM to see if this issue persists?

I try to TensorRT-LLM v0.12.0，but problem still exists

NVIDIA / TensorRT-LLM