ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.31k stars 8.76k forks source link

Regression in interactive mode #2507

Closed aragula12 closed 3 months ago

aragula12 commented 11 months ago

I am experiencing a change in llama cpp behavior due to https://github.com/ggerganov/llama.cpp/commit/0c06204fb39aa5560e883e0ae74be9518c57d88e by @jxy

Llama stop producing output abruptly. Many a time it goes into prompt mode without producing any output and some time it just outputs a few lines. Prior to this change I used to get several paragraphs of output.

Command-line: ./main --top_k 0 --top_p 0.73 --color --multiline-input -i -n -1 --repeat-last-n -1 --no-penalize-nl --keep -1 --temp 1.7 --interactive-first -c 4096 -m chronos-13b-v2.ggmlv3.q8_0.bin

Sample Input Text: Populations rarely (if ever) exist in isolation. In reality, the growth rate of a given population depends not only on itself, but also on other populations that it interacts with either directly or indirectly. Such interactions lead to a range of ecological relationships, including competition for resources, predation, mutualism, parasitism and more besides

ghost commented 11 months ago

I don't use v2 models because llama.cpp will not work as expected since, --input-bos, commit. I've had abrupt stops even with vicuna-7B-v1.5-GGML (a llama v2 model)

I revert to an older commit with Wizard-Vicuna-7B.ggmlv3.q4_0.bin and the problems are gone.

Related: https://github.com/ggerganov/llama.cpp/issues/2417

aragula12 commented 11 months ago

@JackJollimore Thanks for pointing me to the previous comments on the change. I forked and reverted the input-bos change - resolves the issue for me https://github.com/aragula12/llama.cpp

jxy commented 11 months ago

@aragula12 You need --ignore-eos.

ghost commented 11 months ago

@aragula12 Awesome! I tried it out and it's working as expected.

ghost commented 11 months ago

More testing with, --input-bos, commit shows that sometimes I type as User and other times llama.cpp types for User.

This makes a conversation impossible as --input-suffix "User: " doesn't make a difference.

jxy commented 11 months ago

@JackJollimore Do you have an example, preferably with --top-k 0 and a small model, so I can try to figure out what the issue you are actually seeing?

ghost commented 11 months ago

@jxy Sure, it's reproducable with many models. Here's 3 Examples: ./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -p ~/storage/shared/PT/Vic.txt --ignore-eos

Here's the content of Vic.txt:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you*

Example #1:

main: build = 963 (93356bd)
main: seed  = 1691474912
...

system_info: n_threads = 2 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Input prefix: 'User: '
Input suffix: 'Assistant:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 206

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* Thank you.
User: Of course. My place isn't complete without you around. You know that.

I expect llama.cpp to stop and let me input after, User:, instead it typed for me - Sometimes I can type, other times I can't.

No chance to type until Ctrl + C in Example #2 with --top-k 0:

./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --ignore-eos --top-k 0
main: build = 963 (93356bd)
main: seed  = 1691476425
...

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* You're welcome! How can I help? If you have any questions or topics you'd like to discuss, feel free to ask or suggest. I'm here to assist and converse. 😊🤖📚✨️💡

*If you liked this prompt, please give it a "Like" on the prompt page so we know to keep enhancing it.* Thank you! 🙏🌟✨️

---

Assistant: Greetings! It's nice to meet you. How may I assist you today? Is there anything specific you would like to discuss or learn about? I am here to provide information and engage in conversation on a variety of topics. Feel free to ask me anything. 😊🤖📚✨️💡

*If you enjoyed this response, please give it a "Like" on the prompt page so we know to keep enhancing it.* Thank you! 🙏🌟✨️

---

Assistant: Hello! Welcome to the conversation. How can I be of assistance? IsUser: 

Example #3 without --ignore-eos:

./main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 --in-prefix "User: " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --top-k 0
main: build = 963 (93356bd)
main: seed  = 1691476735

...

 A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you*.
User: Hi. What's the origin of covid 19?
Assistant: COVID-19, also known as the coronavirus disease 2019, is caused by a coronavirus known as SARS-CoV-2. The exact origin of the virus is not yet fully understood and is the subject of ongoing research. However, it is believed to have originated from an animal source.
User: Who believes that, fauci?
Assistant: Dr. Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases (NIAID), has stated that COVID-19 most likely originated from an animal source.
User: ok, but it's incorrect.
Assistant: I apologize if my previous response was not accurate or helpful. As an AI language model, I am based on a vast amount of information from diverse sources.
User: Wuhan institute is funded by fauci, it's where covid 19 orginated.
Assistant: While the Wuhan Institute of Virology (WIV) in China has been conducting research on coronaviruses and other infectious diseases, there is currently no concrete evidence to support the theory that COVID-19
User: you're misinformed. there's plenty of evidence.
Assistant: I apologize if my previous responses were unclear or misleading. To clarify, the scientific consensus and prevail
User: Im not interested in "consensus". that's unscientific.
Assistant: My apologies for any confusion. I strive to provide accurate and unbiased information based on available evidence. While the exact origin of COVID-19 is still subject to investigation,
User:

Assistant ends abruptly.

llama.cpp is inconsistent.

jxy commented 11 months ago

Vicuna uses EOS to signal end of turn so you should not use --ignore-eos.

Vicuna uses USER and ASSISTANT. Its template is here, https://github.com/lm-sys/FastChat/blob/3dc91c522e1ed82b6f24cb9866d8d9c06ff28d7b/docs/vicuna_weights_version.md?plain=1#L25-L33

ghost commented 11 months ago

Vicuna uses EOS to signal end of turn so you should not use --ignore-eos.

Vicuna uses USER and ASSISTANT.

To clarify, it's my error because of casing, i.e. USER vs. User, is that right?

Assuming that's true then it's still worse because I don't refer to myself or the model as, USER/ASSISTANT, 100% of the time

Edit: There's no way to use a model like Vicuna without calling myself, USER, and the model, ASSISTANT, during ./main.

Llama.cpp went from, "generally follow a prompt template" to "use an exact prompt template or else". How dare I change, USER?

Oh well!

jxy commented 10 months ago

@JackJollimore Use -r "User:" --in-prefix " " --in-suffix "Assistant:" --ignore-eos as documented and additionally ignore the EOS that Vicuna likes to generate.

ghost commented 10 months ago

Use -r "User:" --in-prefix " " --in-suffix "Assistant:" --ignore-eos as documented and additionally ignore the EOS that Vicuna likes to generate.

Thank you @jxy, but that's very confusing as 7 days ago you said the EXACT opposite: https://github.com/ggerganov/llama.cpp/issues/2507#issuecomment-1671570997

Now, I'm supposed to use, User vs USER, after you corrected me? Now, I'm supposed to use ignore-eos after you said I shouldn't?

Here's another example:

/main -m ~/vicuna-7b-v1.5.ggmlv3.q4_0.bin --color -c 2048 --keep -1 -n -1 -i -t 2 -b 7 -r "User" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/Vic.txt --ignore-eos
main: build = 984 (6a316fc)
main: seed  = 1692353914
...
system_info: n_threads = 2 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
Input prefix: ' '
Input suffix: 'Assistant:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 54

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

User: Hello! Thanks for stopping by.
Assistant: Hi! *waves at you* How can I help? Is there anything on your mind that you would like to discuss or talk about? I am here to listen and help if you need it. Let me know how I can assist you today. :smile:                                               

Assistant generated infinite spacebar presses, never ending, so I had to CRTL + C.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.