Proposal: Add ability to declare game characters

Introduction

A new command <<character NAME>> will declare that the character NAME is expected to be present in the yarn dialogue.

Rationale

This is comparable to the way we declare variables, and has similar purposes:

allows the engine to catch potential bugs if the writer were to misspell a name;
allows to declare how the character's name should correspond to character ID as used in the script (or multiple such IDs);
serve as a convenient way to document who the different characters are.

For example:

// The school's Headmaster, and the Chief Warlock of the Wizengamot.
// A bespectacled old wizard with silver beard and piercing gaze.
// Kind, courageous, with a sense of humor. Is generally considered
// to be the most powerful wizard in the world. 
// Full name: Albus Percival Wulfric Brian Dumbledore.
<<character "Albus Dumbledore" Dumbledore Headmaster>>

// The son of Harry Potter.
<<character "Albus Potter" Albus>>

Later in the script:

Dumbledor: It is not our abilities that determine who we are, Harry, it is our choices.

would throw a compile-time error:

NameError: Unknown character name "Dumbledor"; did you mean "Dumbledore"?

Current workaround is to check the character names at runtime when they arrive into the DialogView (or LineProvider). Note that for this solution the characters still need to be defined, only then it is in a place far removed from the yarn script, which is inconvenient. Plus, this alternative solution only works at runtime, not at compile time.

Proposed solution

The proposed solution consists of the following elements:

Add a new boolean flag into YarnProject strict_character_names, which is false by default (for backward compatibility).

Add a new Yarn command with the following diagram:

|----[ '<<' ]----[ 'character' ]----[ QUOTED_STRING? ]----[ IDENTIFIER+ ]----[ '>>' ]----|

The function of this command is to:
- if present, treat QUOTED_STRING as the "true" character name, otherwise the first ID in the IDENTIFIER list is the name;
- ~~verify that a character with the same name has not been declared before~~ -- a yarn file may want to declare a different ID for a character, if that makes sense within the context of that file;
- store the character's name into an internal list;
- associate the given identifiers with the character's name.
Whenever the compiler parses a dialogue line or an option, and detects a character id at the start of that line, then it would check that
- either strict_character_names is false, or
- the character id is present in the list of declared persons:
  - if yes, then the corresponding character name is returned to the user;
  - if not, an exception is thrown.

It is also possible to add a check, once the project finishes parsing all the scripts, that all declared characters were actually seen in the dialogue at least once.

Backwards Compatibility

The change is backward-compatible as controlled by the strict_character_names project flag.

Alternatives considered

Alternative names for the command are possible: <<person>>, <<name>>, <<entity>>.

Acknowledgments

The proposal is inspired by Shakespeare, who includes the list "Dramatis personae" at the beginning of every play.
Thanks to @McJones for helpful discussion and suggestions!

Update: modified the proposal to be able to express the correspondence between character names versus character ids. This is because an in-game name may not be a valid ID (it may contain spaces, dots, dashes, commas, apostrophes, quotes, etc). Thus, it's useful to be able to express the relationship between names as used in the game versus ids as used in Yarn scripts.

I like this idea in general but I do have a few thought/concerns.

First how will this work with localisation? Will the strict_character_names only be tested once at compile time for the development language or do we also do a run through the other locales? And if it does work across all languages that opens up a bunch of interesting concerns around how that will be handled.

Secondly As to the structure I think something more like '<<' 'character' QUOTED_STRING? ~[/><{}:]+ '>>' (not tested this syntax) makes the most sense so you can basically have anything that isn't a special symbol in the character name. This I believe will match the same as what is currently allowed in character names instead of the more limited IDENTIFIER token type. Actually as I type this out I would have to check the rules around QUOTED_STRING as it may have handling for interpolation which we don't want here.

Thirdly I am wondering if a new command is the best way to do this. This feels like something that maybe should be project wide metadata instead of baked into the yarn files themselves but as I say that I realise we only have the yarn files currently... I wonder if this calls for a more agnostic Yarn project metadata than just yarn files... hmm something to think about.

Finally I am somewhat lost as to the benefit provided with the proper and convenience forms of character names. If people are using character names already correctly in the yarn what is gained by storing extra data that isn't relevant to the line? And if they aren't using the names correctly they don't really need to know that they spelled the name of Luke Skywalker incorrectly, they need to know (as you have shown) they spelled Luke as Luk. If the intent is for it to be used as a display name I am somewhat against that side of the proposal because the way a name needs to be presented in game is going to fluctuate on a game by game basis, so again feels like storing something that might have no purpose to the game itself.

Regardless I definitely like the core idea and I especially like how its completely backwards compatible. There would definitely be some quirks to be worked out, such as the exact syntax, the structure, how its stored and when its evaluated however.

Thoughts?

Thank you for your consideration, @McJones.

First how will this work with localization?

This proposal was mainly for the Yarn scripts and writing the dialogue in the base language. But you're right that similar concerns might be present for the translated scripts as well. It is perhaps less of a concern there, if the game uses only the base-language names in order to match lines of dialogue to the NPCs who are talking.

What this proposal is trying to address is a potential scenario where the game has a DialogueView for "Harry", and another one for "Ron" -- suddenly a line comes along for "Hary", and it just disappears into the void because none of the views think it's theirs.

For the localized versions, this is not as much of the concern as long as the game still uses base names in order to deliver the lines. Thus, even if the translated script says "Гарі" instead of "Гаррі", it would just be a typo, and so the damage is minimal.

Still, the character names are part of the content, and we might think how that content ought to be presented for the translation. What if the <<character>> command had a #line: tag on it, and then the name got extracted into the translation tool?

Secondly As to the structure I think something more like '<<' 'character' QUOTED_STRING? ~[/><{}:]+ '>>' makes the most sense

Not sure I follow you here... So, my thinking was that a character will have one full name, say "Harry Potter", and expressed as a QUOTED_STRING. The quoted string can have any non-dynamic content inside, and any special characters can be escaped if needed. Then, there are also associated character IDs -- which must be IDENTIFIERs, otherwise they wouldn't work as a character marker at the start of the line. So, you can say

Harry: Expelliarmus!

but not

Harry Potter: Expelliarmus!

Allowing any part of the string before : to be considered a character name would be a significant breaking change, because then an innocuous string like And the God said: Let there be light! would suddenly be marked as belonging to a character named "And the God said".

Thirdly I am wondering if a new command is the best way to do this. This feels like something that maybe should be project wide metadata instead

Sure, it IS a project-wide metadata, but it could still be inside the yarn files, right? This is quite similar to the <<declare>> commands, which also create project-wide variables.

A sensible approach for the script writers would then be to create a separate file "characters.yarn" where all the characters will be defined, and include it at the start of the project. Similarly how one might create a "variables.yarn" where all the variables are declared.

There could be a rule that if a file contains no nodes, then it is considered "project-wide metadata".

Finally I am somewhat lost as to the benefit provided with the proper and convenience forms of character names.

I agree that's a bit subtle, but my thinking here was that:

It might not be obvious how the character ID used at the start of a line might translate to the character name. In the example that I've shown, Albus is used for "Albus Potter", but not for "Albus Dumbledore". Ideally, there would be at some point an IDE support where you can just click on a character ID and it would take you to the place where that character's declaration.
A single character might have multiple names, which they might even change within a single dialogue. My favorite dialogue-related joke is this:
```
Barrista: What would you like to order, sir?
Peter: A cup of Cappucino, please. Medium size.
Barrista: Of course. And may I have your name?
Peter: Peter.
Peter: Ok, your order will be right up, sir.
Unknown: ...wth?
```
This does not come up particularly often, but I definitely seen scenarios where:
- the character was amnesiac and suddenly remembers their name mid-dialogue;
- the character was duplicitous, and now reveals their true name;
- the character agrees with their opponent to take on a fake name for conspiratorial reasons.
  - that fake name may as well be the name of another character in the game.
- a character meets another character with the same name, and the dialogue might need to temporarily use non-standard ids to distinguish between them (in Harry Potter books there is a moment where 7 people polyjuice themselves as Harry Potters and they all talk to each other; the dialogue would have to name them as "Harry1" - "Harry7").
So, for reasons like these I think it would make sense to have a single globally unique name for each character, but that name might map to a variety of character ids at different stages of the dialogue. And sometimes those IDs may even need to be local overrides.

ah ok I think I understand where you are coming from now. I see this as two unrelated elements, first you want a way to ensure that the writers make no mistakes around characters, to prevent the Vade: No, I am your father where it should have been Vader: No, I am your father issue and to also make it so that you can associate multiple names with a single character, so making an explicit connection between "Darth Vader", "Vader", "Anakin Skywalker".

This is where I think I disagree, while I am in favour of the first part, being able to create some way to go "hey you misspelled this name" also using this mechanism to give display and related name info is just going to lead to issues. In particular it's far more useful to require the writers to ensure they stick with the same name no matter what and then at runtime have the dialogue views go "ah ok so this says the character is Vader alright I will show the name Darth Vader".

The circumstances where this causes issue are very rare and in situations where it does cause an issue (such as in the barista joke) you would still need to manually connect the lines between characters into the game characters in some way so that you show the right game object/sprite/etc saying the line. Or not even use the character names as anything but plain text as part of the line, which is the only real reason the barista joke works.

My reasoning is overall the game does need to know what character, regardless of what they are called, said a line. Even if the game needs to hide their name it will still need to know who is saying something.

As an aside I think this has some other potential in being useful to determine line and word count on a character basis, but that is an aside, just flagging it so I don't forget.

Let me try to summarize my thinking so far, so that it's easier to see where we are in agreement, and where we are not.

A Yarn project needs to know what characters should exist in the game. This may serve a variety of purposes, such as: preventing accidental name misspellings, an ability to "find definition" / "find references" to a character in VS plugin, get stats about an individual character, and possibly more.

A character here means just the name. We do not contemplate (for now) associating the character with any additional information, such as link to a bio page, or a portrait image, or special perks, etc. So, a character could be "Harry Potter" or "Darth Vader", and we can declare the existence of these characters using the <<character>> command.

Hence, I propose to create the <<character>> command to declare characters in the game.
Next, we realize that a name such as "Harry Potter" cannot be used in a dialogue as-is, because the dialogue requires an ID of a character. Thus, that name should be mangled into an ID, and the script writer could choose to go with Harry, or Potter, or HarryPotter, or HP, or HarryP, or HarP, etc.

My suggestion then is that the <<character>> command would make this association between the full name and the ID(s) explicit: <<character "Harry Potter" Harry HP>>.
- Is the full name required? -- We could make that part optional, in cases where the name is already an ID (e.g. <<character Yoda>>).
- What is the function of the full name? -- This is the standardized ID of the character as reported to the game. For example, a line Harry: Hi! could be sent to the game annotated as [character name="Harry Potter"]Harry: [/character]Hi!.
- Is full name what is shown within the game? -- Not really, it's just an identity. You could have "char073" as a full name if that's more convenient from the game's standpoint. The point is that what's convenient to the game can be different from what's convenient to the script writer, and the <<character>> command provides a bridge between the two.
Then, there is a question of whether we want to allow multiple IDs mapped to the same character. It seems to me like this would be something useful, and give the script writers more flexibility in how they write their dialogue.

There've been some examples before, but perhaps here's another one: a translator may declare that Гаррі is another ID for "Harry Potter", so that the line Гаррі: Привіт! could get delivered to the game as [character name="Harry Potter"]Гаррі: [/character]Привіт!. Note that this doesn't say anything about how the name of the character ought to be presented in game. That can be accomplished with a simple user-defined command, if needed.
There's also the question of scope. The declaration ought to be global (the whole point of the command is to have some kind of global view of the character cast), so it would make sense to have the <<character>> commands declared at some kind of global level. We can say that any command that is encountered outside of any Node has a global effect. The recommended practice would then be to have a dedicated characters.yarn file, which would list all the characters upfront.

However, we could also say if the <<character>> command is encountered inside a Node, then it has local effect -- until the end of that node. This would provide script writers with a way to create local overrides of character names if that makes sense within that particular context.

Again, these overrides have the convenience function only, so that the dialogue could be written more naturally. They connect local character IDs with their true in-game identities. For example:
```
<<character "Harry Potter" Crabbe>>
Draco: What's the matter with you, Crabbe?
Crabbe: Stomach...ache.
```
The DialogueRunner would then annotate the character's name as "Harry Potter", but it's up to the game itself how to show that in the DialogueView.

Also it would be nice to leave an opportunity to change character names like variables. This might be useful when the character is unknown atm.

<<character "???" Hermione>>
Hermione: Holy cricket. You are Harry Potter!
<<set Hermione "Hermione Granger">>
Hermione: I'm Hermione Granger. And... You are?

As we approach YS 3.0 we have been giving a lot of thought to this proposal. Overall we feel that this isn't necessary to be added into the core of Yarn Spinner as it can be handled by whatever receives your yarn dialogue and lines and through existing custom commands. Additionally a lot of this can be reproduced as is right now with no extra code by using variables as names:

<<declare $player = "player">>
{$player}: hello, I am the player.
<<set $player = "Tim">>
{$player}: hello, I am still the player, but with a new name!

As such we don't feel that this is needed to be done in the core of Yarn Spinner and is better handled at runtime, either via variables or via custom commands. Thanks for the proposal.

YarnSpinnerTool / YarnSpinner