Unclear how to properly setup, customize, and verify instruct/chat formatting

tamixy commented 5 months ago

I really like the UI on this, even better than sillytavern, especially being able to switch between chat, instruct, and story and easily editing the whole thing.

However, I'm having difficulty figuring out formatting. For example, setting up llama3 properly, having it add the system string at the start (before perma-memory) and everything else.

For one, I want to know if the console outputs the exact prompt being sent. I'm not sure if there are additional formatting being done automatically that can't be seen in the console. I think the console should definitely be set to output the exact, final prompt being sent, if it isn't. An option on the UI to display the final, complete prompt would be great too.

I have turned on debug mode, but on top of being very hard to use (there should be an option to pick and choose which parts of debug to turn on), the debug input display with tokens seems to be broken? It seems to be missing large chunks of the prompt, it often starts part way through perma-memory and has other sections missing.

Then, I tried to setup the "chatcompletionsadapter" option as shown on the wiki page, but I can't see it doing anything at all. I can't tell if it's supposed to apply to chat mode, instruct mode, both, if it's applying but not displayed on the console, or if it's not applied for some reason. Very unclear.

aleksusklim commented 5 months ago

Clear Memory / Author's Note / World Info
Switch to Story Mode, tick Allow Editing

What you see is what would get sent to the model (unless you already run out of context; in this case something from the beginning is silently dropped).

Here you can play as much as you want to continuously check for model behavior based on formatting! In Chat or Instruct modes, disable Placeholder Tags (also disable Trim Sentences) and work in edit mode (not aesthetic) – this way you will always see the complete document!

(If you need Memory, and you're sure your max context is high enough – just put the contents of the memory to the beginning of your actual history) If you want to insert a new system prompt anywhere inside the story – just do it literally: what you see in edit mode of koboldcpp is RAW text (contrary to many other frontends), the color formatting is purely visual.

tamixy commented 5 months ago

Well, the point is I want to see what is being sent in situations with memory, A/N, different modes like chat or instruct mode, etc. Like, in chatcompletionsadapter there is a string for "system_start", but I can't see that used anywhere. Should it be before memory? Is it silently being added? In what mode?

Basically, I would like to be able to customize automatic formatting, but the ability to do so seems to be very limited, if at all working, other than very limited instruct mode formatting.

I guess that other than wanting the system string automatically added, this is mostly an issue for chat mode so I'll use sillytavern if I want to do chat mode, as that seems barely implemented on koboldcpp (no chat-instruct mode, so pretty much unusable on new models). It would be nice to eventually only need this one UI though.

LostRuins commented 5 months ago

What do you mean by "no chat-instruct mode"? Could you elaborate on how the format should be done?

tamixy commented 5 months ago

Chat mode with instruct formatting.

Chat mode, which character cards load into, has the character image on AI replies, but this mode expects simple, plain text formatting of "Character Name: dialog" with no special tokens in between, which doesn't work well for instruct tuned models, which is most new models,

Instruct mode is where replies have model-specific formatting between replies, like ### Response, <|eot_id|>, etc.

Chat-instruct would add this formatting to chat mode. Or add character pictures to instruct mode, since it doesn't seem like chat mode currently does much more than that.

Also, this applies to normal instruct mode too, but there should be more formatting options than the current "input" and "output". There should be a string for system/memory at least.

Again, the chatcompletionsadapter json example on the wiki has system_start and system_end, but I can't see it doing anything, which is why I started wondering if formatting was being added silently in the first place. Can I get clarification on this? Is this option working at all?

tamixy commented 5 months ago

To clarify, I am not asking for a full implementation of chat mode with all the features you can find on sillytavern. I'm only asking about formatting the system prompt.

Although, I would like to see an option to limit trimming context and inserting A/N at newlines, so they aren't in the middle of a paragraph.

aleksusklim commented 5 months ago

You can simulate effective Chat mode with properly crafted Instruct template. But you cannot do the other way around, because Chat formatting currently is quite strict and cannot be adjusted for specific model. Personally, I use either Instruct mode with custom template formatting, or Story mode and put everything manually.
Whenever you enter editing mode – you see RAW text, nothing is added in-between turns. (Except for AN and WorldInfo – but that does not depend on the mode, as far as I understand). Personally, I am always working in edit mode only, fixing any drifts (newlines, etc.) manually.
You can simulate any System prompt just by adding it manually as text anywhere you want. It won't render correctly in Aesthetic mode, but it will work for sure!

For example, a raw text for in Metharme formatting Start Seq. = \n<|user|>, End Seq. = \n<|model|>: As long as you don't need to edit anything – just sending the next line via the input box will add those tags automatically just fine. Without editing more this is how it renders: – For me that's fine! System prompt is not outlined specifically, but so what? Portraits here can be set in Aesthetic UI customization panel menu.

While working with Llama-3 formatting, I enable Placeholder Tags to see just {{[INPUT]}} and {{[OUTPUT]}} instead of <|eot_id|><|start_header_id|>user<|end_header_id|> at every line.

LostRuins commented 5 months ago

@tamixy are you aware there is a toggle for "aesthetic instruct mode"

tamixy commented 5 months ago

I see, I didn't notice the cog besides aesthetic mode, and transferring from chat mode to instruct mode puts everything into a single block that's considered system which doesn't have a portrait, so I didn't realize this was just part of aesthetic UI.

But anyway, that's not the point, I still haven't got any answer about system prompt formatting and chatcompletionsadapter json file. Obviously I can enter it manually, that's not my question.

Let me clarify just in case it's not clear. The system prompt is what configures the AI behavior, where you put stuff like "You are an assistant AI that...". It should always be present, at the top of the input, and depending on the model should start with a system string of its own. So it's pretty evident that this should be at the start of memory unless there is a separate system input. It's not something you should have at the start of regular context which would get cut off when reaching the context limit. Also, the system string should be added dynamically or automatically, either with an option or something like {{[SYSTEM]}}, so that you can switch model and have the appropriate string set-up.

So, can I get an answer to these questions:

Is there currently any formatting for system prompt?
What is the chatcompletionsadapter json file supposed to do? Does it work?
Is the "Debug: Dump Input Tokens", in console with debug mode, the exact, final input being sent to the model?

aleksusklim commented 5 months ago

transferring from chat mode to instruct mode puts everything into a single block that's considered system

That's because you have to adjust your Start and End Seq to include your characters' names! So their names with : would now split the text into individual turns.

stuff like "You are an assistant AI that...". It should always be present

Not really "always", but generally, yes if you want to tune the model behavior. Most of models out there will behave just fine even without any system prompt!

It's not something you should have at the start of regular context which would get cut off when reaching the context limit.

Reaching the limit is another can of worms https://github.com/LostRuins/koboldcpp/issues/445#issuecomment-1787171366 With recent models and a plenty of memory you should not reach the limit (16k, 64k – that's a lot!) with regular use. So I don't see the problem in having the system prompt right in front of your eyes.

But if you expect hitting the context limit – then yes:

Is there currently any formatting for system prompt?

It's called Memory It is always kept at the beginning of what is sent to the model every time.

It is hidden from main view; it is visible in console listing and saves to JSON of the story. You can edit memory anytime (which would cause cache reprocessing).

so that you can switch model and have the appropriate string set-up.

To switch the model you have to restart the server. At this rate, I don't see a problem in either:

Putting the new correct system prompt into Memory; or
Loading a previously saved story on this model, since you might want to keep some other sampling settings of it (especially the instruct template!), not only the system prompt; or even
leave your system prompt as it is, since it is not THAT important. A good one will be good for other models too, because recent models are very liberal for any formatting you use, and would accept anything reasonable just fine.

What is the chatcompletionsadapter json file supposed to do? Does it work?

Wiki reads:

What is --chatcompletionsadapter
You can pass an optional ChatCompletions Adapter JSON file to force custom instruct tags when launching the KoboldCpp server. This is useful when using the OpenAI compatible Chat Completions API with third party clients. The adapter file takes the following JSON format.

{
"system_start":"str",
"system_end":"str",
"user_start":"str",
"user_end":"str",
"assistant_start":"str",
"assistant_end":"str"
}

I never used it, but this is what I see:

koboldcpp implements so-called "Chat Completions API" originally used by OpenAI for ChatGPT online.
Anything that works with official ChatGPT API (with token access) can work with any model loaded into koboldcpp, because the API is compatible.
Chat Completions API expects a strictly formatted input, because it was used only for ChatGPT. I assume that is either a ChatML formatting, or an array of "turns" (I don't know).
If you give this to any other model directly – its performance would be sub-optimal, because this formatting might be foreign to it.
So koboldccp have to "convert" an OpenAI request to custom "Instruct" formatting (thing we are actually discussing here).
To do this, you just need to know how your particular model wants to have "user", "assistant" and "system" lines formatted.
This setting sets prefixes and suffixes to combine the final text that would get sent to the model!

For example, if "user_start"="User: " and "user_end"="\n" (and similar for the assistant and system), you would get a Chat-like formatting to make a chat-expecting model to work via OpenAI API exposed by koboldcpp.

What you should have already understand:

You don't need this at all if you are not using any external stuff to connect to koboldcpp! If you use only Lite (koboldcpp's native browser client, which you apparently are), this option would give nothing for you anyway.
You can simulate this by properly formatted Instruct tags!

The difference is, instead of 6 strings you have only 2, but that is enough: (SYSTEM is your system prompt, USER is your turn, MODEL is generated; newlines are for clarity)

"system_start" + SYSTEM + "system_end"
"user_start" + USER + "user_end"
"assistant_start" + MODEL + "assistant_end"
"user_start" + USER + "user_end"
"assistant_start"

↓

SYSTEM
"Start Seq." + USER + "End Seq." + MODEL
"Start Seq." + USER + "End Seq"

This is nearly the same thing, when "Start Seq" is equal to "assistant_end" + "user_start" and "End Seq" is equal to "user_end" + "assistant_start" The only problem is the first "system_end" but since you would have to put the system prompt into the memory, you should fake the first turn too.

Maybe @LostRuins would make a fourth mode named "Assistant Mode" where you may enter those 6 strings explicitly instead of juggling with 2 splits… Also this may really have 3 portraits and buttons to add system prompts! More or less, this supersedes the Chat mode, because now the chat mode would be just a named assistant… How about to enhance the chat mode with custom prefixes and suffixes instead?

tamixy commented 5 months ago

So, just to put it concisely:

There is no system prompt formatting. To be fair, it indeed wasn't very necessary before, but in my experience it makes a big difference for llama3, and also opus. Adding a system string option alongside input/output (start/end in menu) to manage it automatically would definitely be a good idea. Additionally, I think these strings should be available in the settings menu in all modes, since the placeholders are useable in all modes, but that's just a detail.
chatcompletionsadapter does have system formatting, but it's only for external API stuff and isn't used by koboldcpp on its own.
TBD, but I assume the Dump Input Tokens is meant to be the final input. I'll have to do some testing if the issue with missing chunks I've seen is only a visual bug on the console or if it's actually missing those chunks when sending input to model. (there were missing chunks from perma-memory, so it's not context limit)

LostRuins commented 5 months ago

A system prompt can be added in Memory, which supports arbitrary text. If you mean the default templates, then no, there is no provided "template" for the system prompt, but you can easily just add it for the model you are using.
Yes. If you're using KoboldCpp with Kobold Lite everything is already configurable within the Lite UI. It should be possible to achieve any formatting you want. Like aleksusklim said, you can put arbitrary text in the memory tab.
the Dump Input Tokens is the result after fast forwarding. In the next version I will include both arrays before and after for a clearer picture.

MrBabai commented 5 months ago

aleksusklim commented 5 months ago

Start seq. <|eot_id|><|start_header_id|>system<|end_header_id|>

This is wrong. It would be put between all turns, not before the whole story.

MrBabai commented 4 months ago

This is wrong. It would be put between all turns, not before the whole story.

Then how properly insert system message with correct tags into prompt? Model card provided by Meta describe prompt format as

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I can't find any other ways to inject system message with proper tags into prompt <|start_header_id|>system<|end_header_id|>{{ system_prompt }}<|eot_id|> except adding it into a Start seq. Yes, it will repeat each turn but at least model strictly follow these directives.

aleksusklim commented 4 months ago

Then how properly insert system message with correct tags into prompt?

This is what we all were discussing in this thread from the beginning. Refer to my older messages here.

LostRuins / koboldcpp

Unclear how to properly setup, customize, and verify instruct/chat formatting #807