Closed steampunque closed 2 months ago
Could you point the exact model file you used?
Could you point the exact model file you used?
I tested on 3 different files: 1) https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-Instruct-v0.1-GGUF/blob/main/Mixtral-8x22B-Instruct-v0.1.Q4_K_M-00001-of-00002.gguf
2), 3) My own quants made with latest version with model downloaded straight from mistralai, https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1, testing IQ4_XS and Q4_K_S
b3028 tokenizer patch breaks all of them.
I plan to run IQ4_XS once bug is corrected as it appears to work better than Q4_K_S on other models and also is 2G smaller for this big model.
I added b3028 revert patches to my github here: https://github.com/steampunque/llama.cpp-patches , covering time span of original patch to last update as of today (b3173).
I don't recommend applying the reverts unless you want to run this model as other wanted changes in newer versions may be erased.
Without patch, b3173:
lm hi
[INST]hi [/INST]
[INST]hi [/INST]
[INST]hi [/INST]
[INST]hi [/INST]
[INST]hi [/INST]
[INST]hi [/INST]
Apply revert_b3028_from_3173.patch :
lm hi
Hello! How can I help you today? Is there something specific you would like to talk about or ask me a question about? I'm here to provide information and answer any questions you have to the best of my ability.
Post the llama-server
and curl
commands that you used, otherwise we can't help
Post the
llama-server
andcurl
commands that you used, otherwise we can't help
b3181 with no revert. Use any of the model files I summarized earlier.
llama-cli -m /datahd/models/Mixtral-8x22B-Instruct-v0.1.IQ4_XS.gguf --color -n -1 --multiline-input --interactive-first --log-disable -ngl 7 -c 8192 -ctk f16 -ctv f16 -b 128 -fa -n 8192 --keep 0 --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0 --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00 --repeat-last-n 64 --repeat-penalty 1.0 --presence-penalty 0.0 --frequency-penalty 0.0 --tfs 1.0 --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0 -p "" --in-prefix "[INST] " --in-suffix " [/INST]"
Aborted conversation:
[INST] Hi
[/INST]
[INST] I'm here to help you with your questions about the 2022 Toyota Corolla.
[/INST]
[INST] What can I assist you with today?
[/INST]
[USER] What are the dimensions of the 2022 Toyota Corolla?
[/USER]
[INST][INST]
b3181 with b3028 reverted:
[INST] Hi
[/INST] Hello! How can I help you today? Is there something specific you would like to know or discuss? I'm here to answer any questions you have to the best of my ability.
[INST]
I don't have 8x22B handy, but with 8x7B I am not able to reproduce using that command:
make -j && ./llama-cli -m ./models/mixtral-instruct-8x7b-fast/ggml-model-q4_k.gguf --color -n -1 --multiline-input --interactive-first -ngl 7 -c 8192 -ctk f16 -ctv f16 -b 128 -fa -n 8192 --keep 0 --temp 0.0 --dynatemp-range 0.0 --dynatemp-exp 1.0 --top-k 40 --top-p 0.95 --typical 1.0 --min-p 0.00 --repeat-last-n 64 --repeat-penalty 1.0 --presence-penalty 0.0 --frequency-penalty 0.0 --tfs 1.0 --mirostat 0 --mirostat-lr 0.1 --mirostat-ent 5.0 -p "" --in-prefix "[INST] " --in-suffix " [/INST]" --verbose-prompt
[INST] Hi
[/INST] Hello! How can I help you today? If you have any questions about a particular topic or just want to chat, feel free to ask me anything. I'm here to provide information and engage in a friendly conversation.
[INST]
You can try to remove the whitespaces from the instruction suffix/prefix: --in-prefix "[INST]" --in-suffix "[/INST]"
- these are incorrect
I downloaded the model and indeed there is a regression for the Mixtral 8x22B models. The [INST]
and [/INST]
tokens are no longer tokenized correctly after (#7500)
Before:
./tokenize -m Mixtral-8x22B-Instruct-v0.1.IQ4_XS-00001-of-00002.gguf -p "[INST]"
3 -> '[INST]'
Now:
1501 -> ' ['
17057 -> 'INST'
29561 -> ']'
cc @jaime-m-p
I downloaded the model and indeed there is a regression for the Mixtral 8x22B models. The
[INST]
and[/INST]
tokens are no longer tokenized correctly after (#7500)Before:
./tokenize -m Mixtral-8x22B-Instruct-v0.1.IQ4_XS-00001-of-00002.gguf -p "[INST]" 3 -> '[INST]'
Now:
1501 -> ' [' 17057 -> 'INST' 29561 -> ']'
cc @jaime-m-p
Below appears to be the list of special tokens the model wants from tokenizer.json. I think mistral instruct v0.3 moved to special token for [INST] [/INST] also (same time the function stuff was added and vocabulary expanded) and it works fine without reverting the patch. Looking at differences in the modes, it appears mistral instruct v0.3 also defines the special tokens in tokenizer_config.json while mixtral 8x22b does not, so that may explain the problem, but the mystery remains as to why special tokens conversion ever worked with older builds.
{
"version": "1.0",
"truncation": null,
"padding": null,
"added_tokens": [
{
"id": 0,
"content": "",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 2,
"content": "",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 3,
"content": "[INST]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 4,
"content": "[/INST]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 5,
"content": "[TOOL_CALLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 6,
"content": "[AVAILABLE_TOOLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 7,
"content": "[/AVAILABLE_TOOLS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 8,
"content": "[TOOL_RESULTS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
},
{
"id": 9,
"content": "[/TOOL_RESULTS]",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": true
}
],
I'm trying to find the root of the problem.
Found some differences while loading special tokens:
dir_tokenizer = "'./models/tokenizers//mixtral8x22b'"
tokenizer = AutoTokenizer.from_pretrained(dir_tokenizer)
tokenizer.added_tokens_encoder
{'<unk>': 0, '<s>': 1, '</s>': 2, '[INST]': 3, '[/INST]': 4, '[TOOL_CALLS]': 5, '[AVAILABLE_TOOLS]': 6, '[/AVAILABLE_TOOLS]': 7, '[TOOL_RESULTS]': 8, '[/TOOL_RESULTS]': 9}
import gguf
dir_tokenizer = "'./models/tokenizers//mixtral8x22b'"
special_vocab = gguf.SpecialVocab(dir_tokenizer)
special_vocab.special_token_ids
{'bos': 1, 'eos': 2, 'unk': 0}
I have to look closer, but I'm confused in the method gguf.SpecialVocab._try_load_from_tokenizer_json()
.
It is loading both files tokenizer.json
and tokenizer_config.json
, then tryes to match tokens with the list special_token_types = ('bos', 'eos', 'unk', 'sep', 'pad', 'cls', 'mask')
, doing an intersection instead of an union, lossing all new special tokens.
but the mystery remains as to why special tokens conversion ever worked with older builds.
Probably this: https://github.com/ggerganov/llama.cpp/pull/7500/commits/938cb4941a594f79a44c36c128645148a69ce6be.
Instead of doing a search of un-mergeable tokens (assuming they are special), now it trust the toktypes
from the GGUF file.
This fixed some problems with other models, but I see it breaks others..
I think the correct option is try to fix gguf.SpecialVocab._try_load_from_tokenizer_json()
, but this forces to regenerate the GGUF files. Not sure if this is aceptable, but the second option is to reintroduce the old behaviour and try to guess with models belong to each option.
I think the correct option is try to fix
gguf.SpecialVocab._try_load_from_tokenizer_json()
, but this forces to regenerate the GGUF files. Not sure if this is aceptable, but the second option is to reintroduce the old behaviour and try to guess with models belong to each option.
In my opinion regen gguf should not be an issue. It has happened multiple times in past with BPE updates and will no doubt continue into future as models evolve with new tokenizers and new vocabularies and other needs. It would be good to output a warning messaging saying that the model should be updated if at all possible to detect the condition. Also might be possible to create a utility to just update metadata to avoid the pain of re converting and re quanting the larger models. This particular issue might wipe a lot of models leaning on special tokens adds though.
If the gguf.SpecialVocab._try_load_from_tokenizer_json()
is the problem, we should fix it. We can require to regenerate such models. Would be nice to print out a warning for old GGUFs that have the problem, but I'm not sure if we will be able to detect outdated ones
I reproduced this in main
.
Input: hi
./main -m Mixtral-8x22B-Instruct-v0.1.i1-Q4_K_M.gguf --no-mmap -fa -ins --in-prefix "[INST] " --in-suffix "[/INST] " -s 0
b3027βno problems
> [INST] hi
[/INST] Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help!
> [INST]
b3028-b3086βstrange prefix/suffix tokens
> [INST] hi
[/INST] [RESP] Hello! How can I help you today? [/RESP]
### Instruction:
> [INST]
./main -m Mixtral-8x22B-Instruct-v0.1.i1-Q4_K_M.gguf --no-mmap -fa -if --in-prefix "[INST] " --in-suffix "[/INST] " -s 0
b3087-b3140βrandom mention of IELTS
[INST] hi
[/INST] hi there! How can I help you today? If you have any questions about the IELTS exam or need some advice on how to prepare for it, feel free to ask. I'm here to help you out.
[INST]
I reproduced this in
main
. Input: hi
Its still broken as of b3233. It is necessary to revert b3027->b3028 update if you want to run the model.
This also broke Yi-1.5-34b.
./main -m Yi-1.5-34B-Chat-16K-Q4_K_M.gguf --no-mmap -fa -cml -s 0 --special
Input: hi b3027
<|im_start|> system
<|im_end|>
<|im_start|> user
> hi
Hello! How can I assist you today?<|im_end|>
>
b3028
<|im_start|>system
<|im_end|>
> hi
Hello! How can I assist you today? <|im_end|>
<|im_start|>user
>
In b3028, <|im_end|> <|im_start|>user
are generated character by character, showing that they are no longer treated as special tokens.
Update. MistralAI updated the tokenizer configs for 8x22b a couple days ago so I ran a new convert/quant and all looks OK now:
SPECIAL=1 tokenize "[INST][/INST][TOOL_CALLS][AVAILABLE_TOOLS][/AVAILABLE_TOOLS][TOOL_RESULTS][/TOOL_RESULTS]"
[1,2,3,4,5,6,7,8,9]
lm Hello. Hello! How can I help you today?
I tested with 3 version : b3266 with b3028 revert patch, b3266 with no revert patch, and latest b3324 with no revert patch and all were good on the new convert. So it looks like Mistral guys fixed the problem on their side, however it will require re converting from their new update as of a couple days ago.
I will leave this issue open for now since the bug breaks other models too, but mixtral 8x22b is now good due to the mistralai 8x22b update.
Will there be a proper support for special tokens such that the model would see the difference between, for example, special token [INST]
and string "[INST]" occurring somewhere in the text?
There is already support - see the parse_special
flag of llama_tokenize
Right. What I meant is support in examples.
There will be, but can't say when - does not seem very high prio IMO
This issue was closed because it has been inactive for 14 days since being marked as stale.
What happened?
Mixtral 8x22b model running with server.
b3027: good lm hi Hello! How can I help you today? Is there something specific you would like to talk about or ask about? I'm here to provide information and answer questions to the best of my ability.
b3028: garbage lm hi π
[INST] I'm here to help you with your questions about the [/INST] π€
[INST] I can provide information on a variety of topics, such as [/INST] π
[INST] - [/INST] π«
Name and Version
b3027 for good run b3028 for broken run
What operating system are you seeing the problem on?
Linux
Relevant log output