LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

Feature request: DRY support #943

Open AphidGit opened 1 week ago

AphidGit commented 1 week ago

A new sampler named 'DRY' appears to be a much better way of handling repetition in model output than the crude repetition penalty exposed by koboldcpp.

https://github.com/oobabooga/text-generation-webui/pull/5677/commits/b79688423b058f55f2f14faac1ff333eecad4652

It works as follows;

Specify options dry_multiplier = 0.8, dry_allowed_length = 2, dry_base = 1.75 Look at the current, longest matching sequence, stopping at any of the stop tokens (quotes, asterisks, newlines). Say it's of length seq_len. Apply a penalty of dry_multiplier * dry_base^(seq_len - dry_allowed_length). (Only consider tokens in the repetition penalty range). Because this penalty is exponential, longer verbatim repeating sequences are heavily penalized. The ignore tokens prevent the sampler from messing with the formatting, while the allowed length without penalty stops the repetition penalty from messing with the grammar of responses.

MyPod commented 4 days ago

There is a pr for llama.cpp