A new sampler named 'DRY' appears to be a much better way of handling repetition in model output than the crude repetition penalty exposed by koboldcpp.
Specify options dry_multiplier = 0.8, dry_allowed_length = 2, dry_base = 1.75
Look at the current, longest matching sequence, stopping at any of the stop tokens (quotes, asterisks, newlines).
Say it's of length seq_len.
Apply a penalty of dry_multiplier * dry_base^(seq_len - dry_allowed_length).
(Only consider tokens in the repetition penalty range).
Because this penalty is exponential, longer verbatim repeating sequences are heavily penalized. The ignore tokens prevent the sampler from messing with the formatting, while the allowed length without penalty stops the repetition penalty from messing with the grammar of responses.
A new sampler named 'DRY' appears to be a much better way of handling repetition in model output than the crude repetition penalty exposed by koboldcpp.
https://github.com/oobabooga/text-generation-webui/pull/5677/commits/b79688423b058f55f2f14faac1ff333eecad4652
It works as follows;
Specify options
dry_multiplier = 0.8, dry_allowed_length = 2, dry_base = 1.75
Look at the current, longest matching sequence, stopping at any of the stop tokens (quotes, asterisks, newlines). Say it's of lengthseq_len
. Apply a penalty ofdry_multiplier * dry_base^(seq_len - dry_allowed_length)
. (Only consider tokens in the repetition penalty range). Because this penalty is exponential, longer verbatim repeating sequences are heavily penalized. The ignore tokens prevent the sampler from messing with the formatting, while the allowed length without penalty stops the repetition penalty from messing with the grammar of responses.