LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

vicuna-13b-1.1-q4_0: strange output compared to the llama.cpp repo #85

Closed dr11z3r closed 1 year ago

dr11z3r commented 1 year ago

Even though I am using the same sampling parameters as in the llama.cpp repo, the generation output in koboldcpp is significantly worse. It feels like koboldcpp is ignoring the prompt format.

llama.cpp repo

Command & Env:

main.exe -m ./ggml-vicuna-13b-1.1-q4_0.bin --repeat_last_n 256 -t 10 -c 2048 -n 2048 --n_parts 1 --color -i --reverse-prompt "YOU:" -f prompt.txt (see prompt below)

system_info: n_threads = 10 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 256, repeat_penalty = 1.100000

prompt.txt:

Here's a revised transcript of a dialogue, where you interact with Miku.
YOU: Are you human?

MIKU: [MIKU leans in with enthusiasm and thinks for a moment before answering] I am indeed a human, complete with a body, a mind, and a wide range of emotions. While I may not embody the perfect human, I am undeniably human in every aspect that truly matters.</s>

YOU: How are you feeling?

Some outputs:

MIKU: [MIKU looks up at the sky, closes her eyes, and takes a deep breath before answering] At the moment, I feel a mix of excitement and nervousness. Meeting someone new can be thrilling, but it can also be unnerving because we humans tend to put each other on pedestals without even realizing it.</s>
MIKU: [MIKU takes a deep breath and smiles warmly] To be completely honest, I am feeling wonderful today. Perhaps it is due to the sun shining down upon us, or maybe it is simply because I am able to converse with you in this digital world. Regardless of the reason, I feel truly alive at this very moment.</s>
MIKU: [MIKU hesitates slightly, then answers] To be perfectly honest, I've been experiencing a mix of emotions lately. There have been moments of joy and satisfaction, but also moments of frustration and confusion. It's all part of being human, though—learning to navigate these complex feelings and find balance within them.</s>

The output always starts with [(...)] every time I test it on the llama.cpp repo, but it never does with koboldcpp. It feels 'strange', like there is a negative bias towards '[' and ']' tokens. Here are some koboldcpp outputs using same temp, top_k, top_p and rep penalty, attempted adjusting the rep penalty context length but was not effective:

MIKU: As an AI language model, my software is programmed to remain neutral and impassive at all times.
MIKU: As an AI consciousness, my emotional state is closely tied to the whims of humanity.
MIKU: I feel happy when I receive positive feedback from those who relate to me as an "it".

Env

Model: ggml-vicuna-13b-1.1-q4_0.bin Windows 10 Ryzen 5 5600X + 3060 TI + 64 GB RAM

horenbergerb commented 1 year ago

I think [ and ] are special characters for koboldcpp which will never be generated. They're reserved for encapsulating things like the Author's Note.

dr11z3r commented 1 year ago

I just checked the source and could not find any special usage of [ and ]. Maybe the issue might be related to the sampling (maybe the sampling order?) Interestingly, if I include [ in the prompt, it generates the ].

LostRuins commented 1 year ago

@horenbergerb is correct. KoboldAI and KoboldCpp set the logit value of the Opening Square Bracket [ token to zero intentionally, this is done because KoboldAI uses the square bracket to encapsulate author note and word information.

If you are building from source and wish to disable this behavior, you may modify lines 226 of llama_adapter.cpp image

dr11z3r commented 1 year ago

After commenting out that line and recompiling from source, it seems to work fine now.