Make token bans work in SillyTavern

LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

5.35k stars 364 forks source link

Make token bans work in SillyTavern #1153

Closed mayaeary closed 1 month ago

mayaeary commented 1 month ago

What's changed:

Increased max logit_bias and banned_tokens size, because I've found out that 24 is not enough for some very sloppy models
Added a parameter banned_token_ids to genparams to be able to ban tokens by id
Adapted some names to work with SillyTavern (checked on 1.12.6)

Unfortunately, it seems that banned_phrases is broken now, it causes an access violaton (model Rocinante-12B-v1.Q8_0), so I couldn't check if that work.

(Banned Phrase Detected: shivers down her spine - Add ID 2641 to banlist at index 57, and rewinding 5 tokens) ['ivers (2641)', ] exception: access violation reading 0x000001AAB2ACA000

LostRuins commented 1 month ago

I can probably increase the limit up to 32, but 64 is going to add quite a lot of overhead, so I would want to avoid that if possible.
If you want to ban a token by ID, you should use logit_bias instead which already exists, and does not need to be added.
I will match the banned_strings as you suggest for ST.

LostRuins commented 1 month ago

Does it still crash after I've merged your latest PR?

LostRuins commented 1 month ago

See https://github.com/LostRuins/koboldcpp/commit/d75cbd671d0c241fccde1eed26e598eab6519dd9

Also, is banned_token_ids a sillytavern feature? I can alias that into logit_bias = -1000 if it is used. It will have the same effect.

mayaeary commented 1 month ago

Does it still crash after I've merged your latest PR?

With that fix it works well now.

going to add quite a lot of overhead

Where is that overhead is going from? As far as I understand the source, the tokens transferred to cpp part once per request, so 32 or 64 is not much difference. And in the actual applying of bans, it's anyway iterating to the banned_token_ids.size(). This limit is applied only when you transfer genparams from python to the cpp. It's much more frustrating, when you add token ban/logit bias and desperately try to figure out why it's not working.

Also, is banned_token_ids a sillytavern feature?

No, sillytavern uses custom_token_bans as string of token ids: 1,2,3,444,42,69.

LostRuins commented 1 month ago

looks good enough to merge for now, might add some sanity checks for bad formatting later.