Inferential capability of qwen.cpp for Qwen-14b-chat is different compared with Qwen-14b-chat of CUDA

QwenLM / qwen.cpp

C++ implementation of Qwen-LM

Other

514 stars 42 forks source link

Inferential capability of qwen.cpp for Qwen-14b-chat is different compared with Qwen-14b-chat of CUDA #30

Open wertyac opened 9 months ago

wertyac commented 9 months ago

we run Qwen-14b-chat-int4 on qwen.cpp. And ask the same question of the CUDA version. Howerver, qwen.cpp return the wrong answer. But the CUDA version is OK. So with the qwen.cpp the LLM is declined.