Closed lmg-anon closed 1 year ago
@JohannesGaessler I just confirmed that #2373 was indeed the cause of the problems I observed with the quality of the llama.cpp output.
I guess I will close this issue now, but thank you very much for your feedback!
@lmg-anon Thanks for reporting this issue - very useful. Btw, might be also worth looking if https://github.com/ggerganov/llama.cpp/pull/2304 has any additional positive effects. Though I'm not sure it is related to your specific use case.
I tested a specific LLama2 7B model using llama.cpp and observed noticeable quality issues when comparing it to the LLama2 7B HF model with the original lora applied, as well as when using a HF model merge created by the alpaca-lora export_hf_checkpoint script.
The issues I encountered were primarily related to double lines getting merged into one, and the model's confusion about the lora's format, which resulted in a low-quality of the overall output.
Initially, I was unsure if the problem was due to an error on my part, but after coming across this discussion, I realized that others were facing the same problem when using llama.cpp. This leads me to believe that the issue likely lies with ggml/llama.cpp itself. Consequently, I have decided to open this issue to address the matter.
As a comparison:
Output expected from the 7B model
![image](https://github.com/ggerganov/llama.cpp/assets/139719567/779d8d13-e434-402a-9491-426b79677519)Output from llama.cpp (try 1)
Command line: `main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<Output from llama.cpp (try 2, recommended preset from model card)
Command line: `main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<The output can get even worse when you don't prime it with the
X's Persona
.Output from llama.cpp (recommended preset from model card)
Command line: `main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<