问题:
基于qwen1.8b,llm_export后推理,默认计算结果:
` ./llm_demo ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn 0 10
model path is ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn
### model name : Qwen_1.8b
The device support i8sdot:1, support fp16:1, support i8mm: 0
### precision, memory = 2, 2
Can't open file:.tempcache
Load Cache file error.
load tokenizer
load tokenizer Done
### disk embedding is 1
load ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn ... Done!
main, 195, cost time: 3614.658203 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1071 - Total: 1071, rate = 1.000000
main, 199, cost time: 664.346008 ms
Q: whoareyou
A: I am an artificial intelligence language model. I do not have a physical existence or identity, but I exist to assist and provide information to those who interact with me. My to the石竹他们是,石他们失控`
修改llm.cpp的 Llm::load处的config.numthread为1,chat结果:
`./llm_demo ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn 0 10
model path is ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn
### model name : Qwen_1.8b
The device support i8sdot:1, support fp16:1, support i8mm: 0
### precision, memory = 2, 2
Can't open file:.tempcache
Load Cache file error.
load tokenizer
load tokenizer Done
### disk embedding is 1
load ../../../qwen_1.8b_weight_8bit_default_quant/llm.mnn ... Done!
main, 195, cost time: 3571.860107 ms
Prepare for resize opt Begin
Prepare for resize opt End
Fix: 1071 - Total: 1071, rate = 1.000000
main, 199, cost time: 1849.044067 ms
Q: whoareyou
A: I am an artificial intelligence language model. I do not have a physical existence or identity, but I exist to assist and provide information to those who interact with me. My purpose is to assist with tasks such as answering questions, providing information, and generating text based on the input I receive.`
平台(如果交叉编译请再附上交叉编译目标平台): orin-CPU
Platform(Include target platform as well if cross-compiling): orin-cpu(支持sdot指令)
复现概率:较高
Github Version: MNN tag 2.9.1
编译方式:
Compiling Method
cmake -DMNN_SUPPORT_TRANSFORMER_FUSE=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_LLM=ON ..