有无量化到24gB以内的？支持 mermaid diagrams and UML diagrams

QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

5.98k stars 336 forks source link

有无量化到24gB以内的？支持 mermaid diagrams and UML diagrams #684

Open DoiiarX opened 2 weeks ago

DoiiarX commented 2 weeks ago

有无量化到24gB以内的？

jklj077 commented 2 weeks ago

Clarify the scale of the model you would like to use. You can run always 7B without quantization with 24GB VRAM.

DoiiarX commented 2 weeks ago

明确您想要使用的模型的比例。您可以使用 24GB VRAM 始终运行 7B，无需量化。

The qwen2-7b model is weaker in code capabilities compared to the llama3-70b-IQ2XS（22GB） quantized model and also weaker than the deepseek-coder-v2:16b-lite-instruct-q8_0（16GB）.

However, I need more Chinese language support, so I hope qwen2 can provide a quantized model around 16-22GB.

jklj077 commented 2 weeks ago

If you use llama.cpp, you can use -ngl or --n-gpu-layers to control the number of model layers to offload to GPU and use much more powerful models than those with IQ2XS, which simply lose too much in quantization.

The thing is that Qwen2 series don't fit exactly that scale (personal use with single 24GB memory GPU). See comments also at https://github.com/QwenLM/Qwen2/issues/482#issuecomment-2172125101 for the rationale and future plans.

If you are mainly interested in coding, you can also take a look at CodeQwen1.5. If you are adventurous, you can also try Qwen1.5-32B.

jklj077 commented 2 weeks ago

Would you mind sharing some cases that you found Qwen2-7B underperformed?

DoiiarX commented 4 days ago

Would you mind sharing some cases that you found Qwen2-7B underperformed?

I have a strong requirement for mermaid diagrams and UML diagrams, and Qwen is not performing well.