ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.63k stars 9.86k forks source link

terminate running deepseek models with gbnf grammars #4206

Closed 54rt1n closed 7 months ago

54rt1n commented 1 year ago

Prerequisites

On b1557

Expected Behavior

The model should generate output as normal, as defined in the grammar file. This appears to only impact deepseek, as llama variants and yi run fine.

Current Behavior

terminate called after throwing an instance of 'std::out_of_range' what(): _Map_base::at

Environment and Context

AMD Ryzen 7 3700X 8-Core Processor 0a:00.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070] (rev a1) Linux 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

The example below is ./main -n -1 -c 8192 -ngl 0 --repeat_penalty 1.2 --color -i --mirostat 2 -m ../llama/gguf/deepseek-coder-6.7b-instruct.Q8_0.gguf --grammar-file grammar/any_text.gbnf --prompt Test

any_text.gbnf:

root ::= ([^\n]+ "\n")+

Failure Logs

The error happens immediately:

...
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 6.67 GiB (8.50 BPW)
llm_load_print_meta: general.name   = deepseek-ai_deepseek-coder-6.7b-instruct
llm_load_print_meta: BOS token = 32013 '<|begin▁of▁sentence|>'
llm_load_print_meta: EOS token = 32021 '<|EOT|>'
llm_load_print_meta: PAD token = 32014 '<|end▁of▁sentence|>'
llm_load_print_meta: LF token  = 126 'Ä'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 6830.87 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: freq_base  = 100000.0
llama_new_context_with_model: freq_scale = 0.25
llama_new_context_with_model: kv self size  = 4096.00 MiB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 555.07 MiB
llama_new_context_with_model: VRAM scratch buffer: 552.00 MiB
llama_new_context_with_model: total VRAM used: 552.00 MiB (model: 0.00 MiB, context: 552.00 MiB)

system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling:
        repeat_last_n = 64, repeat_penalty = 1.200, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 2, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 8192, n_batch = 512, n_predict = -1, n_keep = 0

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

Testterminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
shroominic commented 1 year ago

I experienced the same

maziyarpanahi commented 9 months ago

Me too, I just cannot seem to convert and quantized https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct

Tried on a latest main branch and still fails with:

terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted (core dumped)
BattlehubCode commented 9 months ago

Same here, tried the latest main branch. main stops working after the first user prompt. I use:

./main -m "deepseek-coder-6.7b-instruct.Q5_K_S.gguf" --grammar-file "grammars/c.gbnf" --prompt "You are an AI programming assistant, utilizing the DeepSeek Coder model, and you only answer questions related to computer science.\n" --in-prefix "### Instruction:\n" --in-suffix "### Response:\n" -r "<|EOT|>\n" -i --interactive-first

ggerganov commented 9 months ago

Deepseek models are not supported at this time. See #5464

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.