QwenLM / qwen.cpp

C++ implementation of Qwen-LM
Other
506 stars 40 forks source link

Qwen-7B-Q4_0 works well on Mac M1, but Qwen-7B-Q8_0 cannot work with a ggml-metal error. #42

Open songkq opened 7 months ago

songkq commented 7 months ago

@simonJJJ Hi, could you please give some advice for this issue? Qwen-7B-Q4_0 works well on Mac M1, but Qwen-7B-Q8_0 cannot.

cmake -B build -DGGML_METAL=ON && cmake --build build -j

./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q4_0.bin --tiktoken ../../assets/qwen.tiktoken -v -p 介绍下三国演义
system info: | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
inference config: | max_length = 2048 | max_context_length = 512 | top_k = 0 | top_p = 0.5 | temperature = 0.95 | num_threads = 0 |
loaded qwen model from ../../ggml_bins/qwen7b-chat-8k-ggml-q4_0.bin within: 88.669 ms

《三国演义》是中国古代四大名著之一,由罗贯中创作。它讲述了从东汉末年到西晋初年之间,中国历史上著名的三国时期的故事。三国时期是中国历史上一个非常重要的时期,它涉及到政治、军事、文化、经济等多个方面,也出现了许多著名的英雄人物,如曹操、刘备、孙权等。《三国演义》以三国时期的历史事件为基础,通过一系列精彩的故事,描述了当时的政治、军事、文化、经济等方面的情况,也展示了当时人们的思想、情感和行为。

prompt time: 5496.2 ms / 24 tokens (229.008 ms/token)
output time: 3756.11 ms / 117 tokens (32.103 ms/token)
total time: 9252.31 ms

./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin --tiktoken ../../assets/qwen.tiktoken -v -p 介绍下三国演义
system info: | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
inference config: | max_length = 2048 | max_context_length = 512 | top_k = 0 | top_p = 0.5 | temperature = 0.95 | num_threads = 0 |
loaded qwen model from ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin within: 87.001 ms

GGML_ASSERT: /workspace/qwen.cpp/third_party/ggml/src/ggml-metal.m:1453: false
[1]    12416 abort      ./main -m ../../ggml_bins/qwen7b-chat-8k-ggml-q8_0.bin --tiktoken  -v -p
fann1993814 commented 7 months ago

Hi, @songkq , 也許你可以嘗試看看我的PR #41 ,裡面有一些實驗數據。 /workspace/qwen.cpp/third_party/ggml/src/ggml-metal.m:1453: false 應該是觸發 OOM