Closed Rufflewind closed 4 months ago
Quality really degraded in the last week. especially with the llama3 models. I have pulled so many changes I'm not sure when it happened, but I agree with you, I think going back a week to 2 will see these problems go away. I'm on linux running on Nvidia.
I'm seeing something similar on Command R+, except with mixed/doubled apostrophes (e.g. "They'`re"). I assume it's the pre-tokenizer, as per the "missing pre-tokenizer type, using: 'default'" warning in the server log with the big bold "GENERATION QUALITY WILL BE DEGRADED! CONSIDER REGENERATING THE MODEL" below it.
Trying to generate a new gguf from the HF weights for command-r-plus fails since #6920, but it might help with llama-3.
I'm seeing something similar on Command R+, except with mixed/doubled apostrophes (e.g. "They'`re"). I assume it's the pre-tokenizer, as per the "missing pre-tokenizer type, using: 'default'" warning in the server log with the big bold "GENERATION QUALITY WILL BE DEGRADED! CONSIDER REGENERATING THE MODEL" below it.
Trying to generate a new gguf from the HF weights for command-r-plus fails since #6920, but it might help with llama-3.
Kafkaesque.
I'm not a developer, but using LM Studio I have noticed the same thing after upgrading to the latest version, which included an updated llama.cpp, using Q8 llama 3 70b models on an M3 Max.
Plenty of apostrophe errors, ranging from adding a space between the apostrophe and an "s" (example: Mary' s glass of water, instead of Mary's glass of water), or omitted "s" in the same context (example: Mary' glass of water, instead of Mary's glass of water). I have also noticed double apostrophes here and there.
I haven't changed my prompts, model settings, or model files -- and this didn't occur with prior versions of LM Studio that used an older llama.cpp, with llama-3 70b models.
Hope that helps diagnose the issue.
Is this issue still present with latest master
? Make sure to use a LLaMA 3 model that you have converted yourself with the convert-hf-to-gguf.py
script with the latest llama.cpp
Just tested this in KCPP 1.65 because for us backwards compatibility is a selling point we want the tokenizer behavior to be correct on old models too. I can confirm that my generations in 1.65 while different results with the seed match the apostrophes of 1.63, but 1.64 differs and reproduces this bug.
For context: 1.63 had the old tokenizer and can't run new pretokenizer models. 1.64 was at the beginning of the Llama3 pretokenizer change and lacks CMDR pretokenizer support. 1.65 was released last weekend.
So from what I can see this is solved, others can confirm.
This issue was closed because it has been inactive for 14 days since being marked as stale.
I am using seeing a lot of improperly formatted apostrophes. Some examples I have seen:
I don't remember ever seeing this issue about a week ago, so I suspect it might be a recently introduced bug. It happens fairly consistently (but randomly).
To reproduce this, I just ask the assistant to generate lots of random story text, and it eventually hits the bug after about 1k tokens.
Specs
Example
Request:
Response: