Closed SwamiKannan closed 1 month ago
How big is your sample size? I.e. in how many examples Debug beats Release?
Also, have you tried running perplexity calc using the 2 builds to see if there is any significant difference?
Hi Georgi. Will check the perplexity calc and let you know. My observations are empirical. I must have tried about 15 prompts. Release got it right about 2 - 3 times and debug got them right except may be twice.
What happened?
I created a function-calling multi-agent framework. I am using the llama-server.exe as an inference server and using Nous research's Theta Q4 , Q5 and Q6 models for the LLM. In all these models, I get my function calling perfectly done when I build and run llama.cpp server in Debug mode but in Release mode, it falters a lot. It hallucinates function names and parameters that leads to a lot of parsing errors. I understand that the Release version is far more effecient than the Debug version. So is there a way to get the Release version to mimic the accuracy of the debug version ? I have attached a sample log output but this difference is consistent across multiple function-calls
Name and Version
Windows: Debug and Release server version version: 3673 (8ebe8dde) built with MSVC 19.29.30154.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output