Open higumachan opened 1 month ago
Mac OS 18 rev = "86f37fa803c40e9ee14c43e0028ad32f841ceb07"
The error only occurs with 1) "microsoft/Phi-3-mini-128k-instruct", 2) with a constrained grammar. There is no error when using "meta-llama/Llama-3.1-8B-Instruct" with constrained grammar.
I believe the error arrises because the vocab_size is set at 32064, but "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer.json" only has 3200 tokens, and "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/added_tokens.json" has 11 tokens. I don't know where the missing 53 tokens are.
this is related : microsoft/phi-2/discussions/97, #epfl-dlab/transformers-CFG/pull/83
i suggest somethings like:
pub(crate) fn build_tok_trie(tokenizer: Tokenizer, cfg_vocal_size: usize) -> TokTrie {
let bt = ByteTokenizer::from_tokenizer(tokenizer, cfg_vocal_size).unwrap();
TokTrie::from(&bt.tokrx_info(), &bt.token_bytes())
}
impl ByteTokenizer {
pub fn from_tokenizer(mut hft: Tokenizer, cfg_vocal_size: usize) -> Result<ByteTokenizer> {
...
for tok_id in 0..vocab_size {
...
}
if cfg_vocal_size > res.vocab_size {
let vocab_size_diff = cfg_vocal_size - res.vocab_size;
res.vocab_size = cfg_vocal_size;
res.token_bytes.extend(
(0..vocab_size_diff)
.map(|_| Vec::new())
.collect::<Vec<_>>(),
);
}
}
}
@haricot please feel free to contribute the change. We have a draft PR for reworking the entire grammar system to use llguidance, though, which should be much cleaner.
@EricLBuehler Thanks for the info, this looks great and the draft PR with llguidance fixes this issue when the embed size is different from the vocabulary size, it seems that the resolution of this issue is related to the upgrade of the toktrie_hf_tokenizers crate.
Describe the bug
I got error
directory:
/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs/examples
mymachine environment ``` ProductName: macOS ProductVersion: 14.4.1❯ cargo run --example grammar --release Compiling mistralrs-quant v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-quant) Compiling mistralrs-core v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-core) Compiling mistralrs-vision v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-vision) Compiling mistralrs v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs) Finished
release
profile [optimized] target(s) in 30.75s Running/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/target/release/examples/grammar
2024-09-27T05:38:45.324420Z INFO hf_hub: Token file not found "/Users/yuta/.cache/huggingface/token"2024-09-27T05:38:45.324590Z INFO mistralrs_core::utils::tokens: Could not load token at "/Users/yuta/.cache/huggingface/token", using no HF token. 2024-09-27T05:38:45.325083Z INFO mistralrs_core::pipeline::normal: Loading
tokenizer.json
atmicrosoft/Phi-3.5-mini-instruct
2024-09-27T05:38:45.325540Z INFO mistralrs_core::pipeline::normal: Loadingconfig.json
atmicrosoft/Phi-3.5-mini-instruct
2024-09-27T05:38:45.993416Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"] 2024-09-27T05:38:46.198383Z INFO mistralrs_core::pipeline::normal: Loadinggeneration_config.json
atmicrosoft/Phi-3.5-mini-instruct
2024-09-27T05:38:46.933785Z INFO mistralrs_core::pipeline::normal: Loadingtokenizer_config.json
atmicrosoft/Phi-3.5-mini-instruct
2024-09-27T05:38:46.935057Z INFO mistralrs_core::pipeline::normal: Loading modelmicrosoft/Phi-3.5-mini-instruct
on cpu. 2024-09-27T05:38:46.935316Z INFO mistralrs_core::utils::log: Automatic loader type determined to bephi3
2024-09-27T05:38:46.935866Z INFO mistralrs_core::utils::normal: DType selected is F16. 2024-09-27T05:38:46.935898Z INFO mistralrs_core::pipeline::normal: Model config: Config { vocab_size: 32064, hidden_act: Silu, hidden_size: 3072, intermediate_size: 8192, num_hidden_layers: 32, num_attention_heads: 32, num_key_value_heads: 32, rms_norm_eps: 1e-5, rope_theta: 10000.0, bos_token_id: Some(1), eos_token_id: Some(32000), rope_scaling: Some(Classic { short_factor: [1.0, 1.0199999809265137, 1.0299999713897705, 1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.069999933242798, 1.0999999046325684, 1.1099998950958252, 1.1599998474121094, 1.1599998474121094, 1.1699998378753662, 1.2899998426437378, 1.339999794960022, 1.679999828338623, 1.7899998426437378, 1.8199998140335083, 1.8499997854232788, 1.879999756813049, 1.90999972820282, 1.9399996995925903, 1.9899996519088743, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0799996852874756, 2.0899996757507324, 2.189999580383301, 2.2199995517730713, 2.5899994373321533, 2.729999542236328, 2.749999523162842, 2.8399994373321533], long_factor: [1.0800000429153442, 1.1100000143051147, 1.1399999856948853, 1.340000033378601, 1.5899999141693115, 1.600000023841858, 1.6200000047683716, 2.620000123977661, 3.2300000190734863, 3.2300000190734863, 4.789999961853027, 7.400000095367432, 7.700000286102295, 9.09000015258789, 12.199999809265137, 17.670000076293945, 24.46000099182129, 28.57000160217285, 30.420001983642575, 30.840002059936523, 32.590003967285156, 32.93000411987305, 42.32000350952149, 44.96000289916992, 50.34000396728515, 50.45000457763672, 57.55000305175781, 57.93000411987305, 58.21000289916992, 60.1400032043457, 62.61000442504883, 62.62000274658203, 62.71000289916992, 63.1400032043457, 63.1400032043457, 63.77000427246094, 63.93000411987305, 63.96000289916992, 63.970001220703125, 64.02999877929688, 64.06999969482422, 64.08000183105469, 64.12000274658203, 64.41000366210938, 64.4800033569336, 64.51000213623047, 64.52999877929688, 64.83999633789063], scaling_type: Su }), max_position_embeddings: 131072, use_flash_attn: false, sliding_window: Some(262144), original_max_position_embeddings: 4096, quantization_config: None } 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:04<00:00, 11.71it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:10<00:00, 7.58it/s] 2024-09-27T05:39:05.613379Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Some(Q4K) to 129 tensors. 2024-09-27T05:39:05.613615Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 10 threads. 2024-09-27T05:39:12.267294Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Some(Q4K) to 129 tensors out of 129 total tensors. Took 6.65s 2024-09-27T05:39:12.311255Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "<|endoftext|>", "<|end|>", "<|assistant|>", unk_tok =
Error: shape mismatch in add, lhs: [32064], rhs: [32011]