NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.44k stars 957 forks source link

xverse-65b ERROR #1133

Closed lwbmowgli closed 7 months ago

lwbmowgli commented 8 months ago

I successfully built xverse-65b using the llama example, and successfully deployed it using triton, but an error occurred during inference. What is the reason? [TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'past_key_value_0' has invalid shape (1, 2, 8, 1536, 128) (/app/tensorrt_llm/ cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)

lwbmowgli commented 8 months ago

这是我的build命令 python build_xverse.py --model_dir xverse-65b \ --use_gpt_attention_plugin float16 \ --use_weight_only \ --weight_only_precision int4 \ --max_batch_size 1 \ --output_dir XVERSE-65B \ --world_size 8 \ --tp_size 8

byshiue commented 7 months ago

Please follow the issue template to organize your issue. Thank you for cooperation.