Question about token size/generation

LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.66k stars 334 forks source link

Question about token size/generation #30

Closed Enferlain closed 1 year ago

Enferlain commented 1 year ago

Is there a way to have the generation stop when the bot starts a new line? For example I have 200 tokens set, and even if I disable multiline responses, it will still generate an entire conversation with multiple lines in the terminal, so I have to wait through the whole generation. I could set the tokens to like 50, but then I'm limiting response length for future replies. Also, is there a way to have something text streaming? Thanks!

LostRuins commented 1 year ago

put streaming in the url, e.g. http://localhost:5001?streaming=1 and it should work (use non-aesthetic ui).

https://github.com/LostRuins/koboldcpp/issues/29