✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!
Traceback (most recent call last):
File "/work/mnn/tools/script/apply_lora.py", line 156, in
main(args)
File "/work/mnn/tools/script/apply_lora.py", line 146, in main
base.apply(lora, args.out)
File "/work/mnn/tools/script/apply_lora.py", line 94, in apply
self.apply_lora(op, lora)
File "/work/mnn/tools/script/apply_lora.py", line 70, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range
开发机:ubuntu 20.04 mnn 3.0.0
模型 huggingface:Qwen2.5-0.5B-Instruct 和 Qwen2.5-0.5B-Instruct-GPTQ-Int8
导出 onnx 模型
$ python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export onnx --dst_path mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3
✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s] ⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
. 2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. ✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s] ✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s] ✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s] ✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s] ✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]导出 mnn 模型
$ mnn/build/MNNConvert --modelFile mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn --weightQuantBits 8 --weightQuantBlock 128 --weightQuantAsymmetric --saveExternalData --transformerFuse --allowCustomOp
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0 Don't has bizCode, use MNNTest for default Start to Convert Other Model Format To MNN Model..., target version: 3 [15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8 [15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15 Start to Optimize the MNN Net... Fuse Attention as /Reshape_8_output_0 Fuse Attention as /Reshape_17_output_0 Fuse Attention as /Reshape_26_output_0 Fuse Attention as /Reshape_35_output_0 Fuse Attention as /Reshape_44_output_0 Fuse Attention as /Reshape_53_output_0 Fuse Attention as /Reshape_62_output_0 Fuse Attention as /Reshape_71_output_0 Fuse Attention as /Reshape_80_output_0 Fuse Attention as /Reshape_89_output_0 Fuse Attention as /Reshape_98_output_0 Fuse Attention as /Reshape_107_output_0 Fuse Attention as /Reshape_116_output_0 Fuse Attention as /Reshape_125_output_0 Fuse Attention as /Reshape_134_output_0 Fuse Attention as /Reshape_143_output_0 Fuse Attention as /Reshape_152_output_0 Fuse Attention as /Reshape_161_output_0 Fuse Attention as /Reshape_170_output_0 Fuse Attention as /Reshape_179_output_0 Fuse Attention as /Reshape_188_output_0 Fuse Attention as /Reshape_197_output_0 Fuse Attention as /Reshape_206_output_0 Fuse Attention as /Reshape_215_output_0 Remove past KV for presents Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ] outputTensors: [ logits, presents, ] Converted Success!
转换 LoRA
$ python mnn/tools/script/apply_lora.py --base mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/base.json --lora /work/task_alpha/alpha_lora/checkpoint-800 --scale 2 --out mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/lora_alpha.json
Traceback (most recent call last): File "/work/mnn/tools/script/apply_lora.py", line 156, in
main(args)
File "/work/mnn/tools/script/apply_lora.py", line 146, in main
base.apply(lora, args.out)
File "/work/mnn/tools/script/apply_lora.py", line 94, in apply
self.apply_lora(op, lora)
File "/work/mnn/tools/script/apply_lora.py", line 70, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range
经调试:name = ['', 'mlp', 'gate_proj', 'FakeLinear_output_0__matmul_converted']