llama.cpp is telling me it is adding yet another in the beginning which could affect the performance:
RuntimeWarning: Detected duplicate leading "<bos>" in prompt, this will likely reduce response quality, consider removing it
Describe the solution you'd like
Either llama.cpp should not add a token in the beginning or there should be a switch.
Additional context
This is a prompt with gemma2 template that I give to the create_completion function:
<bos><start_of_turn>user
You are a helpful chat bot, answering questions.
<end_of_turn><start_of_turn>model
OK<end_of_turn><start_of_turn>user
What kind of questions can I ask you?<end_of_turn><start_of_turn>model
Is your feature request related to a problem? Please describe. I am building the prompt myself and calling
llama.cpp is telling me it is adding yet another in the beginning which could affect the performance:
Describe the solution you'd like Either llama.cpp should not add a token in the beginning or there should be a switch.
Additional context
This is a prompt with gemma2 template that I give to the create_completion function: