Open andreys42 opened 2 weeks ago
@andreys42 Unless you are using conversation llama-cli -cnv mode you will need to use --in-prefix --in-suffix or wrap your input in Llama3 prompt template.
@andreys42 Unless you are using conversation llama-cli -cnv mode you will need to use --in-prefix --in-suffix or wrap your input in Llama3 prompt template.
@dspasyuk tnx for suggestion, --in-prefix/--in-suffix
indeed make sense, will try, thank you
As for using llama3 prompt template for my input, I did that and mention it before, this made no differences for me...
You are probably using the wrong template.
Send your request to the /completion
endpoint, then open the /slots
endpoint to see what was effectively sent.
You can compare the good and bad prompts to see what was wrong.
@andreys42 here is the setting I use in llama.cui that works well across major models:
../llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct-q5_k_s.gguf --n-gpu-layers 25 -cnv --simple-io -b 2048 --ctx_size 0 --temp 0 --top_k 10 --multiline-input --chat-template llama3 --log-disable
Here is the result:
Screencast from 2024-07-10 10:20:44 AM.webm
You can test it for yourself here: https://github.com/dspasyuk/llama.cui
What happened?
I'm testing the Meta-Llama-3-8B-Instruct-Q8_0 model using the llamacpp HTTP server, both through the
chatui
interface and direct requests via Python's requests.When I use
chatui
with thechatPromptTemplate
option, everything works fine, and the model's output is predictable and desirable.However, when I make direct requests to the same server with the same model, the output is messy (lot of
newline
characters, repeating of the question, and so on) and most of the system instructions are being ignored (but general logic of ouput is fine), when I ask to answer only with 0 or 1, model still trying to motivate its decision in outputMy attempts so far have been:
I've spent a lot of time trying to figure out the issue, but all of these approaches work much worse than using
chatui
way.I believe the problem lies in my understanding of how to format the input prompts, and I'm not familiar enough with the syntax documentation.
Name and Version
lastest libs Meta-Llama-3-8B-Instruct-Q8_0
What operating system are you seeing the problem on?
No response
Relevant log output
No response