Perhaps my question is not specific to koboldcpp, but I hope to get an answer. I'm testing models, predominantly 70b, and I am getting strange behavior when generating some responses on models.
In the second case, if "ban_eos_token: true", I get this result:
Processing Prompt (1 / 1 tokens)
Generating (125 / 160 tokens)
(EOS token triggered!)
CtxLimit: 204/8192, Process:0.85s (852.0ms/T = 1.17T/s), Generate:84.92s (679.3ms/T = 1.47T/s), Total:85.77s (1.46T/s)
Output: Let' is solve this step by step according to the order of operations (PEMDAS/BODMAS):
Given expression: (10*20+2*35) / 3
First, perform multiplication inside parentheses:
(200 + 70) / 3
Next, add the numbers inside the first parentheses:
270 / 3
Finally, divide:
90
So, the result of (10*20+2*35) / 3 is 90.
Models:
spicyboros-70b-2.2.q6_k
opus-v0.5-70b.Q6_K
Models with lora's: limarp_v2, limarp_v3, airoboros_lmoe.
I just want to understand if the problem lies in the models themselves?
Perhaps my question is not specific to koboldcpp, but I hope to get an answer. I'm testing models, predominantly 70b, and I am getting strange behavior when generating some responses on models.
KoboldCpp - Version 1.61.1
Output:
In the second case, if "ban_eos_token: true", I get this result:
Models:
I just want to understand if the problem lies in the models themselves?
Console parameters: