EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
4.47k stars 311 forks source link

example grammar is failed #797

Open higumachan opened 1 month ago

higumachan commented 1 month ago

Describe the bug

I got error

directory: /Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs/examples mymachine environment ``` ProductName: macOS ProductVersion: 14.4.1

Hardware Overview:

  Model Name: MacBook Pro
  Model Identifier: MacBookPro18,4
  Model Number: Z15H0016ZJ/A
  Chip: Apple M1 Max
  Total Number of Cores: 10 (8 performance and 2 efficiency)
  Memory: 64 GB

❯ cargo run --example grammar --release Compiling mistralrs-quant v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-quant) Compiling mistralrs-core v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-core) Compiling mistralrs-vision v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs-vision) Compiling mistralrs v0.3.0 (/Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/mistralrs) Finished release profile [optimized] target(s) in 30.75s Running /Users/yuta/ghq/github.com/EricLBuehler/mistral.rs/target/release/examples/grammar 2024-09-27T05:38:45.324420Z INFO hf_hub: Token file not found "/Users/yuta/.cache/huggingface/token"
2024-09-27T05:38:45.324590Z INFO mistralrs_core::utils::tokens: Could not load token at "/Users/yuta/.cache/huggingface/token", using no HF token. 2024-09-27T05:38:45.325083Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json at microsoft/Phi-3.5-mini-instruct 2024-09-27T05:38:45.325540Z INFO mistralrs_core::pipeline::normal: Loading config.json at microsoft/Phi-3.5-mini-instruct 2024-09-27T05:38:45.993416Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"] 2024-09-27T05:38:46.198383Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json at microsoft/Phi-3.5-mini-instruct 2024-09-27T05:38:46.933785Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json at microsoft/Phi-3.5-mini-instruct 2024-09-27T05:38:46.935057Z INFO mistralrs_core::pipeline::normal: Loading model microsoft/Phi-3.5-mini-instruct on cpu. 2024-09-27T05:38:46.935316Z INFO mistralrs_core::utils::log: Automatic loader type determined to be phi3 2024-09-27T05:38:46.935866Z INFO mistralrs_core::utils::normal: DType selected is F16. 2024-09-27T05:38:46.935898Z INFO mistralrs_core::pipeline::normal: Model config: Config { vocab_size: 32064, hidden_act: Silu, hidden_size: 3072, intermediate_size: 8192, num_hidden_layers: 32, num_attention_heads: 32, num_key_value_heads: 32, rms_norm_eps: 1e-5, rope_theta: 10000.0, bos_token_id: Some(1), eos_token_id: Some(32000), rope_scaling: Some(Classic { short_factor: [1.0, 1.0199999809265137, 1.0299999713897705, 1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.0499999523162842, 1.069999933242798, 1.0999999046325684, 1.1099998950958252, 1.1599998474121094, 1.1599998474121094, 1.1699998378753662, 1.2899998426437378, 1.339999794960022, 1.679999828338623, 1.7899998426437378, 1.8199998140335083, 1.8499997854232788, 1.879999756813049, 1.90999972820282, 1.9399996995925903, 1.9899996519088743, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0199997425079346, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0299997329711914, 2.0799996852874756, 2.0899996757507324, 2.189999580383301, 2.2199995517730713, 2.5899994373321533, 2.729999542236328, 2.749999523162842, 2.8399994373321533], long_factor: [1.0800000429153442, 1.1100000143051147, 1.1399999856948853, 1.340000033378601, 1.5899999141693115, 1.600000023841858, 1.6200000047683716, 2.620000123977661, 3.2300000190734863, 3.2300000190734863, 4.789999961853027, 7.400000095367432, 7.700000286102295, 9.09000015258789, 12.199999809265137, 17.670000076293945, 24.46000099182129, 28.57000160217285, 30.420001983642575, 30.840002059936523, 32.590003967285156, 32.93000411987305, 42.32000350952149, 44.96000289916992, 50.34000396728515, 50.45000457763672, 57.55000305175781, 57.93000411987305, 58.21000289916992, 60.1400032043457, 62.61000442504883, 62.62000274658203, 62.71000289916992, 63.1400032043457, 63.1400032043457, 63.77000427246094, 63.93000411987305, 63.96000289916992, 63.970001220703125, 64.02999877929688, 64.06999969482422, 64.08000183105469, 64.12000274658203, 64.41000366210938, 64.4800033569336, 64.51000213623047, 64.52999877929688, 64.83999633789063], scaling_type: Su }), max_position_embeddings: 131072, use_flash_attn: false, sliding_window: Some(262144), original_max_position_embeddings: 4096, quantization_config: None } 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:04<00:00, 11.71it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:10<00:00, 7.58it/s] 2024-09-27T05:39:05.613379Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Some(Q4K) to 129 tensors. 2024-09-27T05:39:05.613615Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 10 threads. 2024-09-27T05:39:12.267294Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Some(Q4K) to 129 tensors out of 129 total tensors. Took 6.65s 2024-09-27T05:39:12.311255Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "<|endoftext|>", "<|end|>", "<|assistant|>", unk_tok = Error: shape mismatch in add, lhs: [32064], rhs: [32011]



## Latest commit or version

1eb9cae2a4ec89d7cf8a5fc8d9f57b82f2f747fa
andrewlimmer commented 1 month ago

Mac OS 18 rev = "86f37fa803c40e9ee14c43e0028ad32f841ceb07"

The error only occurs with 1) "microsoft/Phi-3-mini-128k-instruct", 2) with a constrained grammar. There is no error when using "meta-llama/Llama-3.1-8B-Instruct" with constrained grammar.

I believe the error arrises because the vocab_size is set at 32064, but "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer.json" only has 3200 tokens, and "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/added_tokens.json" has 11 tokens. I don't know where the missing 53 tokens are.

Screenshot 2024-10-01 at 2 16 56 PM
haricot commented 1 month ago

this is related : microsoft/phi-2/discussions/97, #epfl-dlab/transformers-CFG/pull/83

i suggest somethings like:

pub(crate) fn build_tok_trie(tokenizer: Tokenizer, cfg_vocal_size: usize) -> TokTrie {
    let bt = ByteTokenizer::from_tokenizer(tokenizer, cfg_vocal_size).unwrap();
    TokTrie::from(&bt.tokrx_info(), &bt.token_bytes())
}

impl ByteTokenizer {
    pub fn from_tokenizer(mut hft: Tokenizer, cfg_vocal_size: usize) -> Result<ByteTokenizer> {
        ...
        for tok_id in 0..vocab_size {
            ...
        }
        if cfg_vocal_size > res.vocab_size {
            let vocab_size_diff = cfg_vocal_size - res.vocab_size;
            res.vocab_size = cfg_vocal_size;
            res.token_bytes.extend(
                (0..vocab_size_diff)
                    .map(|_| Vec::new())
                    .collect::<Vec<_>>(),
            );
        }
    }
}
EricLBuehler commented 1 week ago

@haricot please feel free to contribute the change. We have a draft PR for reworking the entire grammar system to use llguidance, though, which should be much cleaner.

haricot commented 1 week ago

@EricLBuehler Thanks for the info, this looks great and the draft PR with llguidance fixes this issue when the embed size is different from the vocabulary size, it seems that the resolution of this issue is related to the upgrade of the toktrie_hf_tokenizers crate.