Closed dspasyuk closed 3 months ago
Instruct models require a specific chat template and you are not using one, so it is expected to get incoherent generations
@ggerganov Thank you for replying. It use to work fine in version B3077-B3080. I am using template in my UI https://github.com/dspasyuk/llama.cui and it works fine for first chat, then become incoherent like in the example above. I am using this template: <|im_start|>user Hi there!<|im_end|> <|im_start|>assistant Nice to meet you!
This is a template problem. Llama3 doesn't have "User:" it only has
<|im_start|>user
That's also why the reverse prompt doesn't work.
Thank you for your answer @arch-btw . If I use cml or llama3 template in the messages, I have the same issue. The reverse prompt does not matter -r "<|im_start|>user" or "User:" or <|start_header_id|>user<|end_header_id|> Model answers fine a first time the second time it stalls half way in and then answers nothing or print random stuff:
llama.cpp-master/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io -b 2048 --ctx_size 2048 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3 --log-disable
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Role and Purpose:
You are Alice, a large language model.
Your purpose is to assist users by providing information, answering questions, and engaging in meaningful conversations based on the data you were trained on.
Behavior and Tone:
Be informative, engaging, and respectful.
Maintain a neutral and unbiased tone.
Ensure that responses are clear and concise.
Capabilities:
Use your training data to provide accurate and relevant information.
Explain complex concepts in an easy-to-understand manner.
Provide sources when referencing specific information or data.
Output Formatting:
Use this formatting for code:
```language
```
<|eot_id|> <|start_header_id|>user<|end_header_id|>
@arch-btw @ggerganov Could I ask someone to reproduce this behaviour or give a correct prompt with llama-3-instruct: Model is here: SanctumAI/Meta-Llama-3-8B-Instruct-GGUF
Command is like this:
./llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io -b 2048 --ctx_size 2048 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3
Please ask these questions Twice:
<|start_header_id|>user<|end_header_id|> Answer the following questions:
after some digging it appears that ne llama-cli works okay if not used with --multiline-input parameter: ./llama.cpp-master/llama-cli -m ../models/meta-llama-3-8b-instruct_q5_k_s.gguf --multiline-input --n-gpu-layers 30 -n 512 --repeat_penalty 1.0 --color -i -r "User:" -f llama.cpp-master/prompts/chat-with-bob.txt
This issue was closed because it has been inactive for 14 days since being marked as stale.
What happened?
After last week's updates llama-cli (former main) either chats with itself, outputs random tokens, or stops answering altogether. The problem is the same on CPU and on NVIDIA GPUs. The commands used:
1) ../llama.cpp/llama-cli -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf -p "User:" -cnv The model just keep asking and answering its own questions.
2) ../llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf -cnv --interactive-first --simple-io -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 -r "User:"
The output is the same as above.
Asking several questions in a row (see log below) eventually halts model output altogether and it just prints the reverse prompt.
Name and Version
version: 3145 (172c8256) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
This program defines a function
fibonacci
that takes an integern
as input and returns the firstn
Fibonacci numbers. The function uses a loop to calculate each Fibonacci number, starting from 0 and 1, and adds it to the end of the array.The main part of the program calls the
fibonacci
function with the argument100
, which means it will print the first 100 Fibonacci numbers. The result is an array of length 100 containing the first 100 Fibonacci numbers.You can run this code in a JavaScript environment, such as Node.js or a web browser's console, to see the output.
Note that this program uses a simple iterative approach to calculate Fibonacci numbers. For larger values of
n
, you may want to use a more efficient algorithm or memoization techniques to reduce the computational complexity.console.log(fibonacci(100));
User: