LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.14k stars 353 forks source link

Feature Request: Implement XTC sampler (new) #1075

Closed 3dfactor closed 1 month ago

3dfactor commented 2 months ago

Exclude Top Choices (XTC) sampling algorithm is a novel sampler that turns truncation on its head: Instead of pruning the least likely tokens, under certain circumstances, it removes the most likely tokens from consideration.

More precisely, it removes all except the least likely token meeting a given threshold, with a given probability. This ensures that at least one "viable" choice remains, retaining coherence. Truncation samplers can be applied as usual, preventing garbage from being sampled. The result is coherent output (because truncation removes bad tokens) with unprecedented creativity (because XTC removes "boring" tokens).

The oobabooga implementation can be found here along with eloquent description: https://github.com/oobabooga/text-generation-webui/pull/6335

LostRuins commented 2 months ago

Shouldn't be too hard, I have some comments for the creator of this sampler though, following up in your link

LostRuins commented 2 months ago

Will be added in 1.74

https://github.com/LostRuins/koboldcpp/commit/5bf527a6aec241249793be17e4e3b7a0dbed59b2

p-e-w commented 2 months ago

@LostRuins

Please see my comments in that commit.

I would recommend to wait until the parameter discussion in the original PR has been resolved before releasing this in Kobold, to avoid potentially diverging implementations.

LostRuins commented 2 months ago

Yeap sure @p-e-w , it's not live yet. I saw your comments on https://github.com/LostRuins/koboldcpp/commit/5bf527a6aec241249793be17e4e3b7a0dbed59b2#r145630779 and will address them.

Particularly the part about keeping the tail, this line
image gave me the impression that only one candidate should remain at the end. But now I think you're saying I should not touch any tokens below the xtc_threshold (ie. leave them as-is) correct?

So the final result is only a warping of the (n-1) out of n tokens above the threshold (if multiple exist) or nothing at all (if n<=1), no truncation exists in both cases.

p-e-w commented 2 months ago

@LostRuins

But now I think you're saying I should not touch any tokens below the xtc_threshold (ie. leave them as-is) correct?

So the final result is only a warping of the (n-1) out of n tokens above the threshold (if multiple exist) or nothing at all (if n<=1), no truncation exists in both cases.

Yes, that's correct. The text from the image you cut out is intended to supplement the two bar charts, where you can see that the only tokens that are removed (faded out) are the ones above the threshold. A more unambiguous version of the last line would be

...remove all tokens above the threshold, except the least probable one, from sampling

LostRuins commented 1 month ago

Closing as added in latest version.