Open zymy-chen opened 3 weeks ago
Hi @zymy-chen, this is a known issue for TRT version less than v10.1. Could you please check what is your TRT version?
你好@zymy-chen,这是 TRT 版本低于 v10.1 的已知问题。您能检查一下您的 TRT 版本吗?
TRT version is 10.2.0
Can you try to rerun using the latest version of TensorRT-LLM to see if this issue persists?
Can you try to rerun using the latest version of TensorRT-LLM to see if this issue persists?
I try to TensorRT-LLM v0.12.0,but problem still exists
System Info
GPU Name: 8 * H20 TensorRT-LLM : 0.11.0 NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.4
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
qwen2-72B,batch_size=30,input_len =8192 ,output_len=512 It works fine when running FP16, but FP8 comes up tensor volume exceeds 2147483647
Expected behavior
actual behavior
trtllm-build --checkpoint_dir ./tllm_checkpoint_fp8--output_dir ./8-gpu/ --gemm_plugin float16 --max_batch_size 64 --max_input_len 256 --max_output_len 256 --remove_input_padding enable --gpt_attention_plugin float16 --context_fmha enable --workers 8 --tp_size 8 --paged_kv_cache enable --use_paged_context_fmha enable --use_fused_mlp
additional notes
None