Issues with the current implementation of roles and names

The "user" and "assistant" roles were likely created because ChatGPT was only ever made with that dynamic in mind. There's a few issues with doing this:

Having roles be standard words causes issues when they are used in everyday language. This leads it to misinterpret what is part of the template and what is part of the chat. Alpaca is a perfect example of this because it uses markdown for its template.
It limits the scope of what it can be. Even if you tell it that it's something else entirely, having it inherently be called an assistant causes it to keep this mindset when in other roles. A lot of datasets have role-play data in them, but it can still take a lot of prompting to fully shake off the assistant mindset.
It assumes that there are only two different speakers. Let's say that you had several different agents working together. How would the LLM properly distinguish which message came from which agent? Likewise, for a multi-character role-play, how would it properly attribute a message to a certain character?

From the perspective of the LLM, there is no "human" or "AI"; it can generate messages for both without any issues. This is why I think having dedicated roles such as "user" and "assistant" is not something we should have. Instead, those roles should be combined into one, chat, and instead use names to distinguish who wrote what message, so that it can remain as flexible as possible.

The roles should be converted into special tokens like the rest of the format: <|system|> for the system role, <|tool|> for the tool role, and <|chat|> for the messages. For the names, I propose name=[], with the name being enclosed inside. This would allow for names with spaces in them unlike the original implementation.

Here's how it would look like:

<|im_start|><|system|>
{system message}<|im_end|>
<|im_start|><|chat|> name=[Hirose Koichi]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Dolphin]
{message contents}<|im_end|>

And if we wanted to have a multi-agent or character setup, it would look like this:

<|im_start|><|system|>
{system message}<|im_end|>
<|im_start|><|chat|> name=[Project Manager]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Python Coder]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Code Tester]
{message contents}<|im_end|>
<|im_start|><|chat|> name=[Code Debugger]
{message contents}<|im_end|>

This is a much more elegant solution that allows it to handle as many extra roles as needed without forcing a default on it. I understand wanting to keep it as 1:1 to the original implementation as possible, but we should look to improve it wherever we can if we want this to become a widely accepted standard. I think change is fine, but the sooner we make those changes, the better.

I agree with avoiding "assistant" as a role name. And just skipping the human/AI role separation is an idea to think about.

One important benefit of clear AI/human distinction would be to prevent the AI speaking as the human, which has always been a problem, and might get worse if the AI sees all roles as equal. That's why I'd probably keep two distinct roles of human user and AI bot (I like calling it "char", it's just one token and should be self-explanatory enough for humans and AIs).

Here's your multi-agent/char example:

<|system|>
{system message}<|im_end|>
<|im_start_char|>Project Manager
{message contents}<|im_end|>
<|im_start_char|>Python Coder
{message contents}<|im_end|>
<|im_start_char|>Code Tester
{message contents}<|im_end|>
<|im_start_char|>Code Debugger
{message contents}<|im_end|>

@StefanDanielSchwarz

One important benefit of clear AI/human distinction would be to prevent the AI speaking as the human, which has always been a problem, and might get worse if the AI sees all roles as equal. That's why I'd probably keep two distinct roles of human user and AI bot (I like calling it "char", it's just one token and should be self-explanatory enough for humans and AIs).

I personally like the modular approach that ChatML has, so I wouldn't want to have separate start tokens for the system prompt, messages, and tooling; that just seems needlessly complex to save a few tokens. The way it currently works is by using <|im_start|> to signify the start of a sequence, then using the roles system, user, and assistant to convey the expected message contents. It's not that the current "roles" are special in any way; it's just a form of identification to attribute them to a specific speaker, of which using names would be better.

I think moving away from "roles" would be a good idea, and instead tell it what the expected message content is. The <|system|> token would signify a system prompt, and would tell the AI the the contents must be followed strictly. The <|tool|> token would signify a message that isn't supposed to be directly replied to or acknowledged, but rather to invoke outside tooling. The <|chat|> token would signify an interaction between two or more speakers. This would make the use of names mandatory, but I honestly think it's a better system, and it would improve the issues of it speaking for you since it now has more examples of different speakers than just a vague "user" and "assistant."

What I really like from your suggestions is turning the <|im_start|>assistant (that's three tokens) into a single token. We could do that and also simplify a complex construct like <|im_start|><|chat|> name=[Hirose Koichi] (that's 5 tokens in addition to the name!) into an elegant <|im_start_char|> Hirose Koichi or <|im_start_char|>Hirose Koichi (that's just 1 token besides the name!).

I do agree that we can look to save tokens in the name portion, but I do think it has to be enclosed in something. I've been personally testing how to incorporate names into the current format, and I found the following:

<|im_start|>Hirose Koichi

This works well enough if it's just a one word name, but it gives garbage output if it has spaces in it.

<|im_start|>(Hirose Koichi)

I haven't found any output degradation just by simply enclosing it in parenthesis, even when spaces are present in the name, so we can probably just simplify it like this to save tokens:

<|im_start|><|system|>
{system message}<|im_end|>
<|im_start|><|chat|>[Project Manager]
{message contents}<|im_end|>
<|im_start|><|chat|>[Python Coder]
{message contents}<|im_end|>
<|im_start|><|chat|>[Code Tester]
{message contents}<|im_end|>
<|im_start|><|chat|>[Code Debugger]
{message contents}<|im_end|>

One important benefit of clear AI/human distinction would be to prevent the AI speaking as the human, which has always been a problem, and might get worse if the AI sees all roles as equal.

I think this is honestly just a skill issue. It definitely still is an issue, but people make god awful character cards and then wonder why they perform badly. A little while ago, I started working on my own format for character cards that emphasizes organization and structure while remaining human-readable and easy to edit, and I haven't had this issue when testing them. In addition, I don't add things like "don't speak for the user" or "their actions are their own" or any other variation, yet it doesn't speak for me. I haven't done long-term chat testing, but I've found that this behavior is most prevalent in the first few messages, and then after that, it no longer appears. Furthermore, giving a very short rely to a long message would trigger it the most.

My format is pretty simple: markdown headers to separate sections, with the main heading being the character name and subheadings for the rest of the sections; bullet points to list information, making use of indented bullet points to group relevant information together; and all information is from a third-person point of view. I'm working on a character card that will help you create character cards in this format, but it's still going through revisions. I'll be uploading it when I'm done, along with some completed character cards as examples. Definitely not on GitHub, but probably the subreddit for SillyTavern.

I get it; but this is more a protocol thing.

Theres the client, the system, the system owner, and the model who can author messages.

graph TD;
    subgraph System
        Tool-->|role=tool| Model
    end

    Client-->|role=user| Model
    SystemOwner-->|role=system| Model
    Model-->|role=assistant| Client

user means, a client wrote it. assistant means, the model wrote it. system means, the owner of the system wrote it. tool means, the system wrote it.

It will really mess things up to change this scheme, and I really don't want to, unless there is some big new idea (on the level of function calling) that absolutely require changing the roles to support it.

Ah, I see. I didn't think about it from a service point of view. Is it at least possible to swap the roles to be special tokens so that it prevents the assistant behavior from bleeding into other use-cases? Ex: <|system|>, <|tool|>, <|user|>, <|assistant|>

cognitivecomputations / OpenChatML

Issues with the current implementation of roles and names #7