Closed lwbmowgli closed 7 months ago
这是我的build命令 python build_xverse.py --model_dir xverse-65b \ --use_gpt_attention_plugin float16 \ --use_weight_only \ --weight_only_precision int4 \ --max_batch_size 1 \ --output_dir XVERSE-65B \ --world_size 8 \ --tp_size 8
Please follow the issue template to organize your issue. Thank you for cooperation.
I successfully built xverse-65b using the llama example, and successfully deployed it using triton, but an error occurred during inference. What is the reason? [TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'past_key_value_0' has invalid shape (1, 2, 8, 1536, 128) (/app/tensorrt_llm/ cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150)