Open tamixy opened 5 months ago
Memory
/ Author's Note
/ World Info
Story Mode
, tick Allow Editing
What you see is what would get sent to the model (unless you already run out of context; in this case something from the beginning is silently dropped).
Here you can play as much as you want to continuously check for model behavior based on formatting!
In Chat or Instruct modes, disable Placeholder Tags
(also disable Trim Sentences
) and work in edit mode (not aesthetic) – this way you will always see the complete document!
(If you need Memory, and you're sure your max context is high enough – just put the contents of the memory to the beginning of your actual history) If you want to insert a new system prompt anywhere inside the story – just do it literally: what you see in edit mode of koboldcpp is RAW text (contrary to many other frontends), the color formatting is purely visual.
Well, the point is I want to see what is being sent in situations with memory, A/N, different modes like chat or instruct mode, etc. Like, in chatcompletionsadapter there is a string for "system_start", but I can't see that used anywhere. Should it be before memory? Is it silently being added? In what mode?
Basically, I would like to be able to customize automatic formatting, but the ability to do so seems to be very limited, if at all working, other than very limited instruct mode formatting.
I guess that other than wanting the system string automatically added, this is mostly an issue for chat mode so I'll use sillytavern if I want to do chat mode, as that seems barely implemented on koboldcpp (no chat-instruct mode, so pretty much unusable on new models). It would be nice to eventually only need this one UI though.
What do you mean by "no chat-instruct mode"? Could you elaborate on how the format should be done?
Chat mode with instruct formatting.
Chat mode, which character cards load into, has the character image on AI replies, but this mode expects simple, plain text formatting of "Character Name: dialog" with no special tokens in between, which doesn't work well for instruct tuned models, which is most new models,
Instruct mode is where replies have model-specific formatting between replies, like ### Response, <|eot_id|>, etc.
Chat-instruct would add this formatting to chat mode. Or add character pictures to instruct mode, since it doesn't seem like chat mode currently does much more than that.
Also, this applies to normal instruct mode too, but there should be more formatting options than the current "input" and "output". There should be a string for system/memory at least.
Again, the chatcompletionsadapter json example on the wiki has system_start and system_end, but I can't see it doing anything, which is why I started wondering if formatting was being added silently in the first place. Can I get clarification on this? Is this option working at all?
To clarify, I am not asking for a full implementation of chat mode with all the features you can find on sillytavern. I'm only asking about formatting the system prompt.
Although, I would like to see an option to limit trimming context and inserting A/N at newlines, so they aren't in the middle of a paragraph.
For example, a raw text for in Metharme formatting Start Seq.
= \n<|user|>
, End Seq.
= \n<|model|>
:
As long as you don't need to edit anything – just sending the next line via the input box will add those tags automatically just fine.
Without editing more this is how it renders:
– For me that's fine! System prompt is not outlined specifically, but so what?
Portraits here can be set in Aesthetic UI customization panel
menu.
While working with Llama-3 formatting, I enable Placeholder Tags
to see just {{[INPUT]}}
and {{[OUTPUT]}}
instead of <|eot_id|><|start_header_id|>user<|end_header_id|>
at every line.
@tamixy are you aware there is a toggle for "aesthetic instruct mode"
I see, I didn't notice the cog besides aesthetic mode, and transferring from chat mode to instruct mode puts everything into a single block that's considered system which doesn't have a portrait, so I didn't realize this was just part of aesthetic UI.
But anyway, that's not the point, I still haven't got any answer about system prompt formatting and chatcompletionsadapter json file. Obviously I can enter it manually, that's not my question.
Let me clarify just in case it's not clear. The system prompt is what configures the AI behavior, where you put stuff like "You are an assistant AI that...". It should always be present, at the top of the input, and depending on the model should start with a system string of its own. So it's pretty evident that this should be at the start of memory unless there is a separate system input. It's not something you should have at the start of regular context which would get cut off when reaching the context limit. Also, the system string should be added dynamically or automatically, either with an option or something like {{[SYSTEM]}}, so that you can switch model and have the appropriate string set-up.
So, can I get an answer to these questions:
transferring from chat mode to instruct mode puts everything into a single block that's considered system
That's because you have to adjust your Start and End Seq to include your characters' names! So their names with :
would now split the text into individual turns.
stuff like "You are an assistant AI that...". It should always be present
Not really "always", but generally, yes if you want to tune the model behavior. Most of models out there will behave just fine even without any system prompt!
It's not something you should have at the start of regular context which would get cut off when reaching the context limit.
Reaching the limit is another can of worms https://github.com/LostRuins/koboldcpp/issues/445#issuecomment-1787171366 With recent models and a plenty of memory you should not reach the limit (16k, 64k – that's a lot!) with regular use. So I don't see the problem in having the system prompt right in front of your eyes.
But if you expect hitting the context limit – then yes:
Is there currently any formatting for system prompt?
It's called Memory
It is always kept at the beginning of what is sent to the model every time.
It is hidden from main view; it is visible in console listing and saves to JSON of the story. You can edit memory anytime (which would cause cache reprocessing).
so that you can switch model and have the appropriate string set-up.
To switch the model you have to restart the server. At this rate, I don't see a problem in either:
What is the chatcompletionsadapter json file supposed to do? Does it work?
Wiki reads:
What is --chatcompletionsadapter
You can pass an optional ChatCompletions Adapter JSON file to force custom instruct tags when launching the KoboldCpp server. This is useful when using the OpenAI compatible Chat Completions API with third party clients. The adapter file takes the following JSON format.
{
"system_start":"str",
"system_end":"str",
"user_start":"str",
"user_end":"str",
"assistant_start":"str",
"assistant_end":"str"
}
I never used it, but this is what I see:
For example, if "user_start"="User: " and "user_end"="\n" (and similar for the assistant and system), you would get a Chat-like formatting to make a chat-expecting model to work via OpenAI API exposed by koboldcpp.
What you should have already understand:
The difference is, instead of 6 strings you have only 2, but that is enough: (SYSTEM is your system prompt, USER is your turn, MODEL is generated; newlines are for clarity)
"system_start" + SYSTEM + "system_end"
"user_start" + USER + "user_end"
"assistant_start" + MODEL + "assistant_end"
"user_start" + USER + "user_end"
"assistant_start"
↓
SYSTEM
"Start Seq." + USER + "End Seq." + MODEL
"Start Seq." + USER + "End Seq"
This is nearly the same thing, when "Start Seq" is equal to "assistant_end" + "user_start"
and "End Seq" is equal to "user_end" + "assistant_start"
The only problem is the first "system_end" but since you would have to put the system prompt into the memory, you should fake the first turn too.
Maybe @LostRuins would make a fourth mode named "Assistant Mode" where you may enter those 6 strings explicitly instead of juggling with 2 splits… Also this may really have 3 portraits and buttons to add system prompts! More or less, this supersedes the Chat mode, because now the chat mode would be just a named assistant… How about to enhance the chat mode with custom prefixes and suffixes instead?
So, just to put it concisely:
You can easily add system prompt and other needed template formatting into Settings - Start and Stop Seq. windows Example for Llama-3: Start seq. <|eot_id|><|start_header_id|>system<|end_header_id|>You are large adult cat with black fur. Always act as real life cat.Cats can't talk<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n
End seq. <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nMeow.
Start seq. <|eot_id|><|start_header_id|>system<|end_header_id|>
This is wrong. It would be put between all turns, not before the whole story.
This is wrong. It would be put between all turns, not before the whole story.
Then how properly insert system message with correct tags into prompt? Model card provided by Meta describe prompt format as
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
I can't find any other ways to inject system message with proper tags into prompt <|start_header_id|>system<|end_header_id|>{{ system_prompt }}<|eot_id|> except adding it into a Start seq. Yes, it will repeat each turn but at least model strictly follow these directives.
Then how properly insert system message with correct tags into prompt?
This is what we all were discussing in this thread from the beginning. Refer to my older messages here.
I really like the UI on this, even better than sillytavern, especially being able to switch between chat, instruct, and story and easily editing the whole thing.
However, I'm having difficulty figuring out formatting. For example, setting up llama3 properly, having it add the system string at the start (before perma-memory) and everything else.
For one, I want to know if the console outputs the exact prompt being sent. I'm not sure if there are additional formatting being done automatically that can't be seen in the console. I think the console should definitely be set to output the exact, final prompt being sent, if it isn't. An option on the UI to display the final, complete prompt would be great too.
I have turned on debug mode, but on top of being very hard to use (there should be an option to pick and choose which parts of debug to turn on), the debug input display with tokens seems to be broken? It seems to be missing large chunks of the prompt, it often starts part way through perma-memory and has other sections missing.
Then, I tried to setup the "chatcompletionsadapter" option as shown on the wiki page, but I can't see it doing anything at all. I can't tell if it's supposed to apply to chat mode, instruct mode, both, if it's applying but not displayed on the console, or if it's not applied for some reason. Very unclear.