EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.44k stars 248 forks source link

Error: unsupported dtype BF16 for op matmul (Mistral-Large-Instruct-2407) #669

Closed Remember20240719 closed 4 weeks ago

Remember20240719 commented 1 month ago

Describe the bug

mistralrs-server crashes when running Mistral Large Instruct 2407 using ISQ.

Note: the error looks like https://github.com/EricLBuehler/mistral.rs/issues/437

Latest commit or version

Commit 249299bd32649517a6f24245166ea5f3c463a869

cargo run --release  -- --port 1234 --isq Q2K plain -m $D/DATA/models/Mistral-Large-Instruct-2407 -a mistral
    Finished release [optimized] target(s) in 0.27s
     Running `target/release/mistralrs-server --port 1234 --isq Q2K plain -m $D/models/Mistral-Large-Instruct-2407 -a mistral`
2024-08-06T02:36:11.295134Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-06T02:36:11.295159Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-06T02:36:11.295166Z  INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-06T02:36:11.295231Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `$D/models/Mistral-Large-Instruct-2407`
2024-08-06T02:36:11.295250Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` locally at `$D/models/Mistral-Large-Instruct-2407/tokenizer.json`
2024-08-06T02:36:11.295253Z  INFO mistralrs_core::pipeline::normal: Loading `config.json` at `$D/models/Mistral-Large-Instruct-2407`
2024-08-06T02:36:11.295259Z  INFO mistralrs_core::pipeline::normal: Loading `config.json` locally at `$D/models/Mistral-Large-Instruct-2407/config.json`
2024-08-06T02:36:11.296370Z  INFO mistralrs_corie::pipeline::paths: Found model weight filenames ["model-00005-of-00051.safetensors", "model-00007-of-00051.safetensors", "model-00002-of-00051.safetensors", "model-00008-of-00051.safetensors", "model-00004-of-00051.safetensors", "model-00003-of-00051.safetensors", "model-00006-of-00051.safetensors", "model-00001-of-00051.safetensors", "model-00009-of-00051.safetensors", "model-00010-of-00051.safetensors", "model-00013-of-00051.safetensors", "model-00011-of-00051.safetensors", "model-00012-of-00051.safetensors", "model-00015-of-00051.safetensors", "model-00014-of-00051.safetensors", "model-00018-of-00051.safetensors", "model-00017-of-00051.safetensors", "model-00020-of-00051.safetensors", "model-00016-of-00051.safetensors", "model-00021-of-00051.safetensors", "model-00019-of-00051.safetensors", "model-00022-of-00051.safetensors", "model-00023-of-00051.safetensors", "model-00024-of-00051.safetensors", "model-00025-of-00051.safetensors", "model-00026-of-00051.safetensors", "model-00029-of-00051.safetensors", "model-00028-of-00051.safetensors", "model-00032-of-00051.safetensors", "model-00030-of-00051.safetensors", "model-00033-of-00051.safetensors", "model-00031-of-00051.safetensors", "model-00027-of-00051.safetensors", "model-00034-of-00051.safetensors", "model-00035-of-00051.safetensors", "model-00037-of-00051.safetensors", "model-00036-of-00051.safetensors", "model-00039-of-00051.safetensors", "model-00038-of-00051.safetensors", "model-00040-of-00051.safetensors", "model-00042-of-00051.safetensors", "model-00044-of-00051.safetensors", "model-00045-of-00051.safetensors", "model-00043-of-00051.safetensors", "model-00041-of-00051.safetensors", "model-00046-of-00051.safetensors", "model-00051-of-00051.safetensors", "model-00049-of-00051.safetensors", "model-00050-of-00051.safetensors", "model-00048-of-00051.safetensors", "model-00047-of-00051.safetensors"]
2024-08-06T02:36:11.296401Z  INFO mistralrs_core::pipeline::paths: Loading `model-00005-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00005-of-00051.safetensors`
2024-08-06T02:36:11.296409Z  INFO mistralrs_core::pipeline::paths: Loading `model-00007-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00007-of-00051.safetensors`
2024-08-06T02:36:11.296415Z  INFO mistralrs_core::pipeline::paths: Loading `model-00002-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00002-of-00051.safetensors`
2024-08-06T02:36:11.296422Z  INFO mistralrs_core::pipeline::paths: Loading `model-00008-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00008-of-00051.safetensors`
2024-08-06T02:36:11.296429Z  INFO mistralrs_core::pipeline::paths: Loading `model-00004-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00004-of-00051.safetensors`
2024-08-06T02:36:11.296436Z  INFO mistralrs_core::pipeline::paths: Loading `model-00003-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00003-of-00051.safetensors`
2024-08-06T02:36:11.296442Z  INFO mistralrs_core::pipeline::paths: Loading `model-00006-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00006-of-00051.safetensors`
2024-08-06T02:36:11.296449Z  INFO mistralrs_core::pipeline::paths: Loading `model-00001-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00001-of-00051.safetensors`
2024-08-06T02:36:11.296456Z  INFO mistralrs_core::pipeline::paths: Loading `model-00009-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00009-of-00051.safetensors`
2024-08-06T02:36:11.296462Z  INFO mistralrs_core::pipeline::paths: Loading `model-00010-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00010-of-00051.safetensors`
2024-08-06T02:36:11.296469Z  INFO mistralrs_core::pipeline::paths: Loading `model-00013-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00013-of-00051.safetensors`
2024-08-06T02:36:11.296476Z  INFO mistralrs_core::pipeline::paths: Loading `model-00011-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00011-of-00051.safetensors`
2024-08-06T02:36:11.296482Z  INFO mistralrs_core::pipeline::paths: Loading `model-00012-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00012-of-00051.safetensors`
2024-08-06T02:36:11.296489Z  INFO mistralrs_core::pipeline::paths: Loading `model-00015-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00015-of-00051.safetensors`
2024-08-06T02:36:11.296495Z  INFO mistralrs_core::pipeline::paths: Loading `model-00014-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00014-of-00051.safetensors`
2024-08-06T02:36:11.296502Z  INFO mistralrs_core::pipeline::paths: Loading `model-00018-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00018-of-00051.safetensors`
2024-08-06T02:36:11.296508Z  INFO mistralrs_core::pipeline::paths: Loading `model-00017-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00017-of-00051.safetensors`
2024-08-06T02:36:11.296515Z  INFO mistralrs_core::pipeline::paths: Loading `model-00020-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00020-of-00051.safetensors`
2024-08-06T02:36:11.296522Z  INFO mistralrs_core::pipeline::paths: Loading `model-00016-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00016-of-00051.safetensors`
2024-08-06T02:36:11.296528Z  INFO mistralrs_core::pipeline::paths: Loading `model-00021-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00021-of-00051.safetensors`
2024-08-06T02:36:11.296535Z  INFO mistralrs_core::pipeline::paths: Loading `model-00019-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00019-of-00051.safetensors`
2024-08-06T02:36:11.296542Z  INFO mistralrs_core::pipeline::paths: Loading `model-00022-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00022-of-00051.safetensors`
2024-08-06T02:36:11.296548Z  INFO mistralrs_core::pipeline::paths: Loading `model-00023-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00023-of-00051.safetensors`
2024-08-06T02:36:11.296555Z  INFO mistralrs_core::pipeline::paths: Loading `model-00024-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00024-of-00051.safetensors`
2024-08-06T02:36:11.296562Z  INFO mistralrs_core::pipeline::paths: Loading `model-00025-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00025-of-00051.safetensors`
2024-08-06T02:36:11.296569Z  INFO mistralrs_core::pipeline::paths: Loading `model-00026-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00026-of-00051.safetensors`
2024-08-06T02:36:11.296575Z  INFO mistralrs_core::pipeline::paths: Loading `model-00029-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00029-of-00051.safetensors`
2024-08-06T02:36:11.296583Z  INFO mistralrs_core::pipeline::paths: Loading `model-00028-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00028-of-00051.safetensors`
2024-08-06T02:36:11.296590Z  INFO mistralrs_core::pipeline::paths: Loading `model-00032-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00032-of-00051.safetensors`
2024-08-06T02:36:11.296597Z  INFO mistralrs_core::pipeline::paths: Loading `model-00030-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00030-of-00051.safetensors`
2024-08-06T02:36:11.296604Z  INFO mistralrs_core::pipeline::paths: Loading `model-00033-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00033-of-00051.safetensors`
2024-08-06T02:36:11.296610Z  INFO mistralrs_core::pipeline::paths: Loading `model-00031-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00031-of-00051.safetensors`
2024-08-06T02:36:11.296617Z  INFO mistralrs_core::pipeline::paths: Loading `model-00027-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00027-of-00051.safetensors`
2024-08-06T02:36:11.296625Z  INFO mistralrs_core::pipeline::paths: Loading `model-00034-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00034-of-00051.safetensors`
2024-08-06T02:36:11.296632Z  INFO mistralrs_core::pipeline::paths: Loading `model-00035-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00035-of-00051.safetensors`
2024-08-06T02:36:11.296638Z  INFO mistralrs_core::pipeline::paths: Loading `model-00037-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00037-of-00051.safetensors`
2024-08-06T02:36:11.296645Z  INFO mistralrs_core::pipeline::paths: Loading `model-00036-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00036-of-00051.safetensors`
2024-08-06T02:36:11.296652Z  INFO mistralrs_core::pipeline::paths: Loading `model-00039-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00039-of-00051.safetensors`
2024-08-06T02:36:11.296658Z  INFO mistralrs_core::pipeline::paths: Loading `model-00038-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00038-of-00051.safetensors`
2024-08-06T02:36:11.296665Z  INFO mistralrs_core::pipeline::paths: Loading `model-00040-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00040-of-00051.safetensors`
2024-08-06T02:36:11.296672Z  INFO mistralrs_core::pipeline::paths: Loading `model-00042-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00042-of-00051.safetensors`
2024-08-06T02:36:11.296678Z  INFO mistralrs_core::pipeline::paths: Loading `model-00044-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00044-of-00051.safetensors`
2024-08-06T02:36:11.296685Z  INFO mistralrs_core::pipeline::paths: Loading `model-00045-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00045-of-00051.safetensors`
2024-08-06T02:36:11.296692Z  INFO mistralrs_core::pipeline::paths: Loading `model-00043-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00043-of-00051.safetensors`
2024-08-06T02:36:11.296698Z  INFO mistralrs_core::pipeline::paths: Loading `model-00041-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00041-of-00051.safetensors`
2024-08-06T02:36:11.296705Z  INFO mistralrs_core::pipeline::paths: Loading `model-00046-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00046-of-00051.safetensors`
2024-08-06T02:36:11.296711Z  INFO mistralrs_core::pipeline::paths: Loading `model-00051-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00051-of-00051.safetensors`
2024-08-06T02:36:11.296718Z  INFO mistralrs_core::pipeline::paths: Loading `model-00049-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00049-of-00051.safetensors`
2024-08-06T02:36:11.296725Z  INFO mistralrs_core::pipeline::paths: Loading `model-00050-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00050-of-00051.safetensors`
2024-08-06T02:36:11.296731Z  INFO mistralrs_core::pipeline::paths: Loading `model-00048-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00048-of-00051.safetensors`
2024-08-06T02:36:11.296738Z  INFO mistralrs_core::pipeline::paths: Loading `model-00047-of-00051.safetensors` locally at `$D/models/Mistral-Large-Instruct-2407/model-00047-of-00051.safetensors`
2024-08-06T02:36:11.296811Z  INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `$D/models/Mistral-Large-Instruct-2407`
2024-08-06T02:36:11.296818Z  INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` locally at `$D/models/Mistral-Large-Instruct-2407/generation_config.json`
2024-08-06T02:36:11.296879Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `$D/models/Mistral-Large-Instruct-2407`
2024-08-06T02:36:11.296886Z  INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` locally at `$D/models/Mistral-Large-Instruct-2407/tokenizer_config.json`
2024-08-06T02:36:11.297540Z  INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-06T02:36:11.297544Z  INFO mistralrs_core::pipeline::normal: Loading model `$D/models/Mistral-Large-Instruct-2407` on cpu.
2024-08-06T02:36:11.299448Z  INFO mistralrs_core::pipeline::normal: Model config: Config { vocab_size: 32768, hidden_size: 12288, intermediate_size: 28672, num_hidden_layers: 88, num_attention_heads: 96, num_key_value_heads: 8, hidden_act: Silu, max_position_embeddings: 131072, rms_norm_eps: 1e-5, rope_theta: 1000000.0, sliding_window: None, use_flash_attn: false, head_dim: None }
Error: unsupported dtype BF16 for op matmul
   0: candle_core::error::Error::bt
   1: <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::matmul
   2: candle_core::tensor::Tensor::matmul
   3: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
   4: <mistralrs_core::pipeline::normal::NormalLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   5: <mistralrs_core::pipeline::normal::NormalLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   6: mistralrs_server::main::{{closure}}
   7: mistralrs_server::main
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
   9: std::rt::lang_start::{{closure}}
  10: std::rt::lang_start_internal
  11: main
  12: __libc_start_call_main
  13: __libc_start_main@@GLIBC_2.34
  14: _start

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
   2: <mistralrs_core::pipeline::normal::NormalLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   3: <mistralrs_core::pipeline::normal::NormalLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   4: mistralrs_server::main::{{closure}}
   5: mistralrs_server::main
   6: std::sys_common::backtrace::__rust_begin_short_backtrace
   7: std::rt::lang_start::{{closure}}
   8: std::rt::lang_start_internal
   9: main
  10: __libc_start_call_main
  11: __libc_start_main@@GLIBC_2.34
  12: _start
WildEgo commented 1 month ago

Similar issue with a docker setup (cpu 0.2) and the phi-3-vision model

2024-08-13T10:22:32.805904Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-13T10:22:32.805934Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-13T10:22:32.805952Z  INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-13T10:22:32.805998Z  INFO hf_hub: Token file not found "/root/.cache/huggingface/token"    
2024-08-13T10:22:32.806078Z  INFO mistralrs_core::pipeline::vision: Loading `tokenizer.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:22:33.614414Z  INFO mistralrs_core::pipeline::vision: Loading `config.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:22:34.032699Z  INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
2024-08-13T10:25:11.706991Z  INFO mistralrs_core::pipeline::vision: Loading `preprocessor_config.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:25:12.110866Z  INFO mistralrs_core::pipeline::vision: Loading `tokenizer_config.json` at `microsoft/Phi-3-vision-128k-instruct`
Error: unsupported dtype BF16 for op matmul
   0: candle_core::error::Error::bt
   1: <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::matmul
   2: candle_core::tensor::Tensor::matmul
   3: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
   4: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   5: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   6: mistralrs_server::main::{{closure}}
   7: mistralrs_server::main
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
   9: std::rt::lang_start::{{closure}}
  10: std::rt::lang_start_internal
  11: main
  12: <unknown>
  13: __libc_start_main
  14: _start

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
   2: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   3: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   4: mistralrs_server::main::{{closure}}
   5: mistralrs_server::main
   6: std::sys_common::backtrace::__rust_begin_short_backtrace
   7: std::rt::lang_start::{{closure}}
   8: std::rt::lang_start_internal
   9: main
  10: <unknown>
  11: __libc_start_main
  12: _start
EricLBuehler commented 1 month ago

@WildEgo @Remember20240719 I think this may be fixed now after #690 and #676.

Remember20240719 commented 1 month ago

Thanks! Now I'm getting a new error: "cannot find tensor info for output_norm.weight".

$ ./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF --quantized-filename Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf
2024-08-18T01:40:20.694685Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-18T01:40:20.694716Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-18T01:40:20.694733Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-08-18T01:40:20.694887Z  INFO mistralrs_core::pipeline::paths: Loading `Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf` locally at `$D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF/Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf`
2024-08-18T01:40:20.694993Z  INFO mistralrs_core::pipeline::gguf: Loading model `$D/DATA/models3/bartowski/Mistral-Large-Instruct-2407-GGUF` on cpu.
2024-08-18T01:40:20.801701Z  INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.basename: Mistral
general.file_type: 16
general.finetune: Instruct
general.languages: en, fr, de, es, it, pt, zh, ja, ru, ko
general.license: other
general.license.link: https://mistral.ai/licenses/MRL-0.1.md
general.license.name: mrl
general.name: Mistral Large Instruct 2407
general.quantization_version: 2
general.size_label: Large
general.type: model
general.version: 2407
llama.attention.head_count: 96
llama.attention.head_count_kv: 8
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.block_count: 88
llama.context_length: 131072
llama.embedding_length: 12288
llama.feed_forward_length: 28672
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 32768
quantize.imatrix.chunks_count: 148
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 616
quantize.imatrix.file: /models_out/Mistral-Large-Instruct-2407-GGUF/Mistral-Large-Instruct-2407.imatrix
split.count: 3
split.no: 0
split.tensors.count: 795
2024-08-18T01:40:20.847842Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32768, num added tokens: 0, num merges: 0, num scores: 32768
2024-08-18T01:40:20.849113Z  INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.last and system_message is defined %}\n            {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}\n        {%- else %}\n            {{- '[INST] ' + message['content'] + '[/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n`
Error: cannot find tensor info for output_norm.weight
   0: candle_core::error::Error::bt
   1: candle_core::quantized::gguf_file::Content::tensor
   2: <mistralrs_core::models::quantized_llama::ModelWeights as mistralrs_core::utils::model_config::FromGGUF>::from_gguf
   3: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   4: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   5: mistralrs_server::main::{{closure}}
   6: mistralrs_server::main
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: std::rt::lang_start::{{closure}}
   9: std::rt::lang_start_internal
  10: main
  11: __libc_start_call_main
  12: __libc_start_main@@GLIBC_2.34
  13: _start

Stack backtrace:
   0: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
   1: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
   2: mistralrs_server::main::{{closure}}
   3: mistralrs_server::main
   4: std::sys_common::backtrace::__rust_begin_short_backtrace
   5: std::rt::lang_start::{{closure}}
   6: std::rt::lang_start_internal
   7: main
   8: __libc_start_call_main
   9: __libc_start_main@@GLIBC_2.34
  10: _start

$ ls -1 $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF
Mistral-Large-Instruct-2407.imatrix
Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf
Mistral-Large-Instruct-2407-Q5_K_S-00002-of-00003.gguf
Mistral-Large-Instruct-2407-Q5_K_S-00003-of-00003.gguf
README.md

$ git rev-parse HEAD
13f565596c4c09b16e7c4d5412ef634d3104bd71
EricLBuehler commented 1 month ago

@Remember20240719 this is because output_norm.weight is not in the file you loaded. I just merged #692 which adds support for sharded GGUF models, so this command should work for you:


./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF --quantized-filename "Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf Mistral-Large-Instruct-2407-Q5_K_S-00002-of-00003.gguf Mistral-Large-Instruct-2407-Q5_K_S-00003-of-00003.gguf"
Remember20240719 commented 1 month ago

@EricLBuehler thank you! This now produces another error: "Error: Multiple contents have multiple split.count fields". Could it be an issue in those GGUF files?

EricLBuehler commented 1 month ago

@Remember20240719 no, this was a bug. It should be fixed in #695, can you please git pull and try again?

Remember20240719 commented 4 weeks ago

@EricLBuehler It's working great! Thank you.

I see the option to "Close with comment", I will close this issue. I hope that's fine with you.

EricLBuehler commented 4 weeks ago

Sounds good! Please let me know if you have any other issues.