LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.35k stars 364 forks source link

Make token bans work in SillyTavern #1153

Closed mayaeary closed 1 month ago

mayaeary commented 1 month ago

What's changed:

Unfortunately, it seems that banned_phrases is broken now, it causes an access violaton (model Rocinante-12B-v1.Q8_0), so I couldn't check if that work.

(Banned Phrase Detected: shivers down her spine - Add ID 2641 to banlist at index 57, and rewinding 5 tokens) ['ivers (2641)', ] exception: access violation reading 0x000001AAB2ACA000

LostRuins commented 1 month ago
LostRuins commented 1 month ago

Does it still crash after I've merged your latest PR?

LostRuins commented 1 month ago

See https://github.com/LostRuins/koboldcpp/commit/d75cbd671d0c241fccde1eed26e598eab6519dd9

Also, is banned_token_ids a sillytavern feature? I can alias that into logit_bias = -1000 if it is used. It will have the same effect.

mayaeary commented 1 month ago

Does it still crash after I've merged your latest PR?

With that fix it works well now.

going to add quite a lot of overhead

Where is that overhead is going from? As far as I understand the source, the tokens transferred to cpp part once per request, so 32 or 64 is not much difference. And in the actual applying of bans, it's anyway iterating to the banned_token_ids.size(). This limit is applied only when you transfer genparams from python to the cpp. It's much more frustrating, when you add token ban/logit bias and desperately try to figure out why it's not working.

Also, is banned_token_ids a sillytavern feature?

No, sillytavern uses custom_token_bans as string of token ids: 1,2,3,444,42,69.

LostRuins commented 1 month ago

looks good enough to merge for now, might add some sanity checks for bad formatting later.