Closed Remember20240719 closed 4 weeks ago
Similar issue with a docker setup (cpu 0.2) and the phi-3-vision model
2024-08-13T10:22:32.805904Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-13T10:22:32.805934Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-13T10:22:32.805952Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-13T10:22:32.805998Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
2024-08-13T10:22:32.806078Z INFO mistralrs_core::pipeline::vision: Loading `tokenizer.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:22:33.614414Z INFO mistralrs_core::pipeline::vision: Loading `config.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:22:34.032699Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00002.safetensors", "model-00002-of-00002.safetensors"]
2024-08-13T10:25:11.706991Z INFO mistralrs_core::pipeline::vision: Loading `preprocessor_config.json` at `microsoft/Phi-3-vision-128k-instruct`
2024-08-13T10:25:12.110866Z INFO mistralrs_core::pipeline::vision: Loading `tokenizer_config.json` at `microsoft/Phi-3-vision-128k-instruct`
Error: unsupported dtype BF16 for op matmul
0: candle_core::error::Error::bt
1: <candle_core::cpu_backend::CpuStorage as candle_core::backend::BackendStorage>::matmul
2: candle_core::tensor::Tensor::matmul
3: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
4: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
5: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
6: mistralrs_server::main::{{closure}}
7: mistralrs_server::main
8: std::sys_common::backtrace::__rust_begin_short_backtrace
9: std::rt::lang_start::{{closure}}
10: std::rt::lang_start_internal
11: main
12: <unknown>
13: __libc_start_main
14: _start
Stack backtrace:
0: anyhow::error::<impl anyhow::Error>::msg
1: <mistralrs_core::utils::normal::ModelDType as mistralrs_core::utils::normal::TryIntoDType>::try_into_dtype
2: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
3: <mistralrs_core::pipeline::vision::VisionLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
4: mistralrs_server::main::{{closure}}
5: mistralrs_server::main
6: std::sys_common::backtrace::__rust_begin_short_backtrace
7: std::rt::lang_start::{{closure}}
8: std::rt::lang_start_internal
9: main
10: <unknown>
11: __libc_start_main
12: _start
@WildEgo @Remember20240719 I think this may be fixed now after #690 and #676.
Thanks! Now I'm getting a new error: "cannot find tensor info for output_norm.weight".
$ ./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF --quantized-filename Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf
2024-08-18T01:40:20.694685Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-18T01:40:20.694716Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-18T01:40:20.694733Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-08-18T01:40:20.694887Z INFO mistralrs_core::pipeline::paths: Loading `Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf` locally at `$D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF/Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf`
2024-08-18T01:40:20.694993Z INFO mistralrs_core::pipeline::gguf: Loading model `$D/DATA/models3/bartowski/Mistral-Large-Instruct-2407-GGUF` on cpu.
2024-08-18T01:40:20.801701Z INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.basename: Mistral
general.file_type: 16
general.finetune: Instruct
general.languages: en, fr, de, es, it, pt, zh, ja, ru, ko
general.license: other
general.license.link: https://mistral.ai/licenses/MRL-0.1.md
general.license.name: mrl
general.name: Mistral Large Instruct 2407
general.quantization_version: 2
general.size_label: Large
general.type: model
general.version: 2407
llama.attention.head_count: 96
llama.attention.head_count_kv: 8
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.block_count: 88
llama.context_length: 131072
llama.embedding_length: 12288
llama.feed_forward_length: 28672
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 32768
quantize.imatrix.chunks_count: 148
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 616
quantize.imatrix.file: /models_out/Mistral-Large-Instruct-2407-GGUF/Mistral-Large-Instruct-2407.imatrix
split.count: 3
split.no: 0
split.tensors.count: 795
2024-08-18T01:40:20.847842Z INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `llama`, kind: `Unigram`, num tokens: 32768, num added tokens: 0, num merges: 0, num scores: 32768
2024-08-18T01:40:20.849113Z INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content'] %}\n {%- set loop_messages = messages[1:] %}\n{%- else %}\n {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n {%- endif %}\n {%- if message['role'] == 'user' %}\n {%- if loop.last and system_message is defined %}\n {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}\n {%- else %}\n {{- '[INST] ' + message['content'] + '[/INST]' }}\n {%- endif %}\n {%- elif message['role'] == 'assistant' %}\n {{- ' ' + message['content'] + eos_token}}\n {%- else %}\n {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n {%- endif %}\n{%- endfor %}\n`
Error: cannot find tensor info for output_norm.weight
0: candle_core::error::Error::bt
1: candle_core::quantized::gguf_file::Content::tensor
2: <mistralrs_core::models::quantized_llama::ModelWeights as mistralrs_core::utils::model_config::FromGGUF>::from_gguf
3: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
4: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
5: mistralrs_server::main::{{closure}}
6: mistralrs_server::main
7: std::sys_common::backtrace::__rust_begin_short_backtrace
8: std::rt::lang_start::{{closure}}
9: std::rt::lang_start_internal
10: main
11: __libc_start_call_main
12: __libc_start_main@@GLIBC_2.34
13: _start
Stack backtrace:
0: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_path
1: <mistralrs_core::pipeline::gguf::GGUFLoader as mistralrs_core::pipeline::Loader>::load_model_from_hf
2: mistralrs_server::main::{{closure}}
3: mistralrs_server::main
4: std::sys_common::backtrace::__rust_begin_short_backtrace
5: std::rt::lang_start::{{closure}}
6: std::rt::lang_start_internal
7: main
8: __libc_start_call_main
9: __libc_start_main@@GLIBC_2.34
10: _start
$ ls -1 $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF
Mistral-Large-Instruct-2407.imatrix
Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf
Mistral-Large-Instruct-2407-Q5_K_S-00002-of-00003.gguf
Mistral-Large-Instruct-2407-Q5_K_S-00003-of-00003.gguf
README.md
$ git rev-parse HEAD
13f565596c4c09b16e7c4d5412ef634d3104bd71
@Remember20240719 this is because output_norm.weight
is not in the file you loaded. I just merged #692 which adds support for sharded GGUF models, so this command should work for you:
./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models3/bartowski/Mistral-Large-Instruct-2407-GGUF --quantized-filename "Mistral-Large-Instruct-2407-Q5_K_S-00001-of-00003.gguf Mistral-Large-Instruct-2407-Q5_K_S-00002-of-00003.gguf Mistral-Large-Instruct-2407-Q5_K_S-00003-of-00003.gguf"
@EricLBuehler thank you! This now produces another error: "Error: Multiple contents have multiple split.count
fields".
Could it be an issue in those GGUF files?
@Remember20240719 no, this was a bug. It should be fixed in #695, can you please git pull
and try again?
@EricLBuehler It's working great! Thank you.
I see the option to "Close with comment", I will close this issue. I hope that's fine with you.
Sounds good! Please let me know if you have any other issues.
Describe the bug
mistralrs-server crashes when running Mistral Large Instruct 2407 using ISQ.
Note: the error looks like https://github.com/EricLBuehler/mistral.rs/issues/437
Latest commit or version
Commit 249299bd32649517a6f24245166ea5f3c463a869