LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.66k stars 334 forks source link

Make it possible to enable square brackets and stopping on stop token #112

Closed h3ndrik closed 1 year ago

h3ndrik commented 1 year ago

For other purposes than KoboldAI (e.g. interfacing with the API directly with my python scripts) it would be nice to have something like an argument in the api call, or a cli flag, to enable or disable the suppression of square brackets.

Sure, I can edit the code and recompile... But something easier would be great. I think that would be cleaner code anyways, not to suppress that here, but to sanitize/strip the input in the application that expects it in a certain way.

I'm not sure if a cli flag for koboldcpp or an argument exposed via the API would be the best solution.

https://github.com/LostRuins/koboldcpp/blob/9129e937f92172264ed99065a1ac2a97c1e3a1be/gpttype_adapter.cpp#L515-L519

One more thing, while talking about that code: What happens with the 'end of text' token? I believe that is suppressed, too? Can we have access to it? I want to feed the output into langchain and I get useful responses, but the llm keeps filling it's answer with random ramblings until the token limit is reached. And i don't know where the vocabulary is stored. In the ggml file? Because with the model i'm fiddling around, i don't ever get something like <|endoftext|> or </s>. Can I overwrite the string representation of token 2 somewhere? And/or add it to the 'stop_sequence'?

LostRuins commented 1 year ago

The Token IDs can be referenced against the official llama vocab. For ggml it is embedded directly into the model file itself. I cannot think of an easy way for a user to specify the list of banned tokens though.

h3ndrik commented 1 year ago

Thank you. Okay. the gpt4-x-alpaca i'm testing seems to have that token removed...

Concerning the banned tokens: Would it be okay to introduce another boolean to the generation_inputs struct? next to the stop_sequence? something like suppress_tokens_for_koboldai and just set it with the ArgumentParser?

I could create a pull request, but i wanted to ask first before trying to clutter the code with things that aren't within koboldcpp's intended purpose.

LostRuins commented 1 year ago

Yeah I'll add a flag for skipping banned tokens. It won't be part of the API, but a launch parameter for koboldcpp itself from the python script

LostRuins commented 1 year ago

Hi, this has been added in 1.13, as --unbantokens

h3ndrik commented 1 year ago

Well, thank you very, very much.