LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.25k stars 360 forks source link

EOS token is triggered by the 's. #749

Open modcos opened 8 months ago

modcos commented 8 months ago

Perhaps my question is not specific to koboldcpp, but I hope to get an answer. I'm testing models, predominantly 70b, and I am getting strange behavior when generating some responses on models.

KoboldCpp - Version 1.61.1

llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'

Output:

Processing Prompt (25 / 25 tokens)
Generating (4 / 420 tokens)
(EOS token triggered!)
CtxLimit: 67/8192, Process:4.25s (170.1ms/T = 5.88T/s), Generate:1.84s (460.5ms/T = 2.17T/s), Total:6.10s (0.66T/s)
Output: Let'

In the second case, if "ban_eos_token: true", I get this result:

Processing Prompt (1 / 1 tokens)
Generating (125 / 160 tokens)
(EOS token triggered!)
CtxLimit: 204/8192, Process:0.85s (852.0ms/T = 1.17T/s), Generate:84.92s (679.3ms/T = 1.47T/s), Total:85.77s (1.46T/s)
Output:  Let' is solve this step by step according to the order of operations (PEMDAS/BODMAS):

Given expression: (10*20+2*35) / 3

First, perform multiplication inside parentheses:
(200 + 70) / 3

Next, add the numbers inside the first parentheses:
270 / 3

Finally, divide:
90

So, the result of (10*20+2*35) / 3 is 90.

Models:

spicyboros-70b-2.2.q6_k
opus-v0.5-70b.Q6_K
Models with lora's: limarp_v2, limarp_v3, airoboros_lmoe.

I just want to understand if the problem lies in the models themselves?

Console parameters:

Input: {"min_p": 0, "max_length": 420, "length_penalty": 1, "epsilon_cutoff": 0, "typical_p": 1, "frequency_penalty": 0, "min_tokens": 0, "tfs": 1, "top_k": 1, "top_a": 0, "no_repeat_ngram_size": 0, "temperature": 0, "stop": ["\nUser:", "\n***", "\nuser:", "\n### User:", "\n### Assistant:"], "skip_special_tokens": true, "prompt": "AI Assistant who will professionally and efficiently answer any questions you may have or offer advice.\nAI Assistant: Hello, how can I help you?\n\nUser: Solve the example: (10*20+2*35) / 3 =\nAssistant:", "guidance_scale": 1, "stop_sequence": ["\nUser:", "\n***", "\nuser:", "\n### User:", "\n### Assistant:"], "grammar_string": "", "negative_prompt": "", "stopping_strings": ["\nUser:", "\n***", "\nuser:", "\n### User:", "\n### Assistant:"], "truncation_length": 8192, "do_sample": true, "mirostat_tau": 5, "mirostat_mode": 0, "mirostat_eta": 0.1, "penalty_alpha": 0, "encoder_repetition_penalty": 1, "repetition_penalty_range": 0, "min_length": 0, "custom_token_bans": "", "sampler_order": [6, 0, 1, 3, 4, 2, 5], "legacy_api": false, "api_type": "koboldcpp", "temperature_last": true, "add_bos_token": true, "eta_cutoff": 0, "max_context_length": 8192, "ban_eos_token": false, "seed": -1, "presence_penalty": 0, "early_stopping": true, "num_beams": 1, "top_p": 0}
LostRuins commented 8 months ago

Sounds like a badly trained model. The EOS token is often represented as </s>, seems like whoever made this model did not configure that correctly.