Open icemaple1251 opened 5 months ago
Q2 is a pretty small quantisation, have you tested your Q2 model in llama.cpp directly to check this isn't just a bad response caused by the quantisation?
I have not tested your Q2 model in llama.cpp directly. But I do have try other models like "mixtral-8x7b-v0.1.Q8_0.gguf" I still get wo wrong answer, some answers may be repeated for several times. If some models are special for chat but others are not?
The mixtral model you mentioned is Q8, which is much more forgiving than Q2. The smaller than number the more the model has been compressed, and the more likely it is to give bad answers.
It works well when I use LLama2-7b-Chat, but when I changed the model to a new version mixtral-8x7b-v0.1Q2_K, when I ask the same question it seems that the robot gave a wrong answer, and it even changed my original question.
Should I change some options or parameters some where when I change to another model? Anyone can help me? thanks.
![correct](https://github.com/SciSharp/LLamaSharp/assets/39147380/324d9f7c-0263-46cb-8e63-cda876006a1e)