Proposal: JSON-based .yarnproject files

Introduction

This proposal introduces a new format for representing Yarn Projects, which control how a collection of Yarn scripts are compiled and potentially configured for use in a game engine.

Current Format

Yarn Projects currently store data in two related files:

The .yarnproject contains a Yarn Spinner script that contains variable declarations.
The .yarnproject.meta file contains Unity references to the Yarn scripts that should be compiled alongside the .yarnproject script, localisation information (including asset locations), and other configuration.

Data stored in .yarnproject.meta files is not available outside of Unity.

Motivation

Yarn Projects store their data in Yarn Spinner script format. This means that to get information out of them, they must be compiled and analysed. It's not impossible to store project-wide data like localisation information in this format (using, for example, node headers, file headers, or specially structured commands, lines or comments), but storing structured data isn't what the Yarn language was designed for.
Currently, there's lots of information that the Yarn Spinner compiler needs to correctly compile a project that is only available to Unity projects. Crucially, this includes the list of .yarn scripts that need to be compiled. This means that tools, like the Visual Studio Code extension, ysc, and the Language Server, have to make assumptions about what to compile, which may not be correct.

This data is not inaccessible, as the .meta files are YAML that may be parsed, but certain information like file references are stored as GUIDs, which require lots of work to dereference when not running inside Unity (and the AssetDatabase is not available to be queried.)

Additionally, this means that it is even more difficult to author this information in a way that works for both the Unity Editor and tools outside of Unity.

Moving this data out of the domain of Unity and into more readily readable and interpretable will make it easier for non-Unity tools to work with it.

Inspiration

.csproj files contain structured data in the form of XML that describes the build process for a C# project, including package dependencies, compilation steps, and other configuration files. By default, all C# files in the same directory and subdirectories of the .csproj file's location are included in the compilation. This behaviour can be augmented or disabled. The structure of the XML is highly hierarchical and nested.

.tsconfig files contain structured data in the form of JSON, and effectively serves as a shorthand for passing flags to the tsc compiler binary. By default, all TypeScript files in the current directory or its subdirectories are included. The structure of the JSON is very flat.

The proposed new format takes the following inspiration from these two formats:

Project files act as an 'anchor' file for a project, indicating that source code files are stored in the same (or descendant) directories as the project file.
Project files contain data in a common structured format, making them easier to extend.
- Project files are declarative, not imperative. (The current version is imperative, and could be extended to become declarative).

JSON vs Other Formats

JSON is proposed here, but is not essential. The reasoning of JSON over other formats (XML, YAML, custom format) is:

JSON is easier to parse than XML or YAML.
JSON is easier to read and write by hand than XML.
JSON parsers are either built in to the standard library of most languages, including .NET 5 and above.
- Where JSON parsers are not available, third-party libraries are readily available.
JSON supports schema validation, like XML. (YAML has only limited validation support.)

Proposed New Format

.yarnproject files take the form of a JSON file.

{
    "projectFileVersion": 2,
    "sourceFiles": ["**/*.yarn"],
    "baseLanguage": "en",
    "localisation": {
        "en": {
            "assets": "../LocalisedAssets/English"
        },
        "de": {
            "strings": "../German.csv",
            "assets": "../LocalisedAssets/German"
        }
    },
    "definitions": "Functions.ysls.json"
}

Properties:

projectFileVersion: The version number for this file.
- Required to be the number 2.
sourceFiles: An array of strings, containing glob patterns.
- Defaults to ["**/*.yarn"] (that is, "all .yarn files in all this directory and all subdirectories")
- The globbing algorithm is the globstar algorithm, in which "*" means "all files in this directory" and "**/*" means "all files in this directory and its subdirectories".
- Patterns are considered to be relative to the directory that the .yarnproject file is in. (that is, ".yarn" will match all .yarn files in the current directory; `../.yarn` will match all .yarn files in the parent directory).
- Files that match any of the glob patterns will be included in the compilation.
- If multiple glob patterns are provided, a file matched by more than one pattern will only be included in the compilation a single time.
baseLanguage: Optional: A string containing a BCP-47 language tag indicating the base language of the project. Defaults to the user's current neutral locale (for example, "en").
localisation: Optional: A dictionary mapping BCP-47 language tags to objects of the following layout:
- strings: A hint indicating where localised text for the locale may be found. This is unused when the localisation is for baseLanguage.
- assets: A hint indicating where localised assets for the locale may be found.
definitions: Optional: A string containing a path to a .ysls.json file that describes available commands and functions. This is used in non-Unity environments, like ysc, to make the compiler aware of available functions and commands.

Hints in localisation.[LANGUAGE].strings and localisation.[LANGUAGE].assets are strings that are formatted in one of the following ways:

Relative paths from the Yarn Project file's directory (indicated by the prefix . or ..);
Absolute paths from the project's root folder (e.g. the Assets folder in Unity, or the Content folder in Unreal, indicated by the prefix ${ProjectRoot});
Absolute paths to a location on disk (indicated by the prefix /);
A content-addressable GUID (indicated by the hint containing only hexadecimal characters, spaces, or hyphens.)

When using Yarn Internal Localisation, localisation.[LANGUAGE].strings is used to locate the .csv file containing the localised text for lines, and localisation.[LANGUAGE].assets is used to locate the folder containing localised assets for each line.

When using Unity Localisation, localisation.[LANGUAGE].strings is used to locate an optional .csv file containing the localised text for lines to pre-populate the string table, and localisation.[LANGUAGE].assets is used to locate the folder containing localised assets for each line to pre-populate import settings for loading an asset table.

The decision to use Yarn Internal or Unity Localisation is stored in the .meta file. The justification for this is that this is exactly what import settings in Unity are for: Unity-specific configuration of how an asset should be imported.

In previous versions of the .yarnproject format, variables could be defined in the text of the file. This is not supported in the new format; all variables are defined in .yarn source files.

Comments are permitted in the file, as per the JSONC flavour of JSON (which adds support for C-style single-line and multi-line comments, // and /* */).

Upgrade process

The following steps are followed to upgrade a file from version 1 (Yarn script) .yarnproject files to version 2 (JSON) files.

When a v1 .yarnproject is identified (by matching the pattern /^(.*?\n)*title:.*\n(.*?\n)*---/m), stop regular import and offer to convert the v1 file to a v2 file.
- If the user declines to convert the file, the project stops normal import and produces a Yarn Program with no content. (If possible, the Inspector offers a button to manually upgrade the file at a time of the user's choosing.)

To convert a v1 file to a v2 file:

The user is warned that the existing .yarnproject file will be rewritten, and to encourage them to make a backup. The user is asked if they wish to proceed; if they decline, conversion stops.
For all declarations in all nodes in the file, extract them and move them to a newly created Yarn file, FILENAME_Variables.yarn, in the same directory as the .yarnproject file.
Set sourceFiles to ['**/*.yarn'].
If any Yarn files referenced by the project are outside the current directory, or are inside a folder offer to move them (and their .meta files, if any) to the current directory, if doing so would not overwrite existing files.
- If the user accepts this offer, do so.
- If the user declines this offer, add specific references to those files to sourceFiles.
The .yarnproject file is rewritten with its new contents.
Backwards Compatibility

Describe the impact that your solution will have on code written in the most recent shipping version of the language. If your proposed changes mean that existing code would need to be changed in order to work, describe in detail what changes would be required, and describe an algorithm (pseudocode is fine) for detecting where these changes are necessary, and how an automated upgrader would either make changes or flag that a human must make changes.

Alternatives considered

The first alternative would be to make no changes, and leave most Yarn Project data remaining in Unity. This restricts most functionality to the Unity game engine, or would necessitate building tools that can interpret and create the .meta files that Unity creates.

Another alternative would be to store project data in the current .yarn format, either in some imperative or declarative way. This would have the same level of functionality as JSON, but would require building additional custom parsing and/or interpretation logic in Yarn Spinner.

A third alternative would be to do away with Yarn Projects altogether, and to require scripts to be grouped manually by the user. This is what Yarn Spinner did prior to Yarn Spinner 2.0, and it was very cumbersome.

Acknowledgments

@McJones contributed design review on this document.

Overall I approve of this new format, but the real pain points to adopting the new format would be the transition of how to handle default variables/variables declared "in the yarn project". The current Unity only solution of tying needed variable declarations to a yarn project is really seamless, many user's don't even realize it's just a yarn file with declare statements.

Since the new JSON format relies on having another yarn file to handle needed variable declarations, it might lead to confusion with this generated yarn file or unnecessarily messy project setups mangling variable.yarn files. Ideally this wouldn't be an issue if most projects adopted better practices when declaring yarn variables, but projects would need extra care and attention when upgrading their projects even with a robust upgrader. I'm not sure how much of an issue this will be however, though users would need to know when and how to declare variables in their projects which the current documentation lacks.

One question I want to ask is who would be editing this new yarn project and how. The current Unity yarn project could only really be edited with Unity's inspector which is a really nice interface for non technical users to work with (sure you can edit the .meta file directly but who does that?). I'm assuming the inspector workflow would still be there, but for other engines like Unreal would that require similar tooling or would the user be expected to edit the JSON file? JSON is human readable and fairly easy to edit for technical folks, but its really not that nice to work with compared to other formats like YAML.

A non technical user would either have to learn and wrangle "scary looking" JSON, or engine integrators would have to spend extra work making suitable tools to manage yarn projects for them. Even though its not a major concern, it would be a missed opportunity to not at least try to make the new yarn project user experience pleasant enough for a non-technical user to use if they needed or have to.

Here are my prosed additions:

Have a field specifically for the "variables.yarn" source yarn file. This would be treated as other source yarn files, but would be easier to track down in case something goes wrong or it needs to be written to.
Have yarn projects be a spec that could support YAML or TOML, but the YarnSpinner interpreter would only accept it in JSON format. This can allow developers who want to write yarn projects in a friendlier format the ability to easily convert it to the JSON format, assuming they also add the pipeline to do so themselves.
Embed the yarn project JSON information into a yarn file. This will require additional work to extract it, but significantly less compared to extending the yarn file header to accommodate the information needed. It might look like this:

// Yarn Project Config
// ---
// {
//     "projectFileVersion": 2,
//     "sourceFiles": ["**/*.yarn"],
//     "baseLanguage": "en",
//     "localisation": {
//         "en": {
//             "assets": "../LocalisedAssets/English"
//         },
//         "de": {
//             "strings": "../German.csv",
//             "assets": "../LocalisedAssets/German"
//         }
//     },
//     "definitions": "Functions.ysls.json"
// }
// ---

title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===

Since really yarn header's can't do stuff like large arrays or key-value pairs easily, those could remain in the JSON realm and everything else could be moved over.

// Yarn Project Config
// ---
// {
//     "sourceFiles": ["**/*.yarn"]
//     "localisation": {
//         "en": {
//             "assets": "../LocalisedAssets/English"
//         },
//         "de": {
//             "strings": "../German.csv",
//             "assets": "../LocalisedAssets/German"
//         }
//     }
// }
// ---

projectFileVersion: 2
baseLanguage: en
definitions: Functions.ysls.json
title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===

The embedded json could be linked externally as well, not sure how useful that would be but it could work.

projectFileVersion: 2
baseLanguage: en
definitions: Functions.ysls.json
config: MyYarnProject.json
title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===

Agree with @KXI-System that YAML is a friendlier format from user's standpoint (even though it may be slightly harder to parse -- but then, yaml libraries are widely available anyways).

I would also suggest that the project file itself was called yarn.yaml instead of .yarnproject:

initial . means the file is hidden on Linux. Is this file intended to be hidden?
the .yaml extension signals to an IDE how the file should be highlighted, which is always good to have.

Have a field specifically for the "variables.yarn" source yarn file.

I'm not sure it's a good idea, since it seems to create a limitation that there will be only one file where variables can be declared? A project may want to declare variables across multiple files, for example one file per area, or per chapter, or per level, or per expansion pack.

Embed the yarn project JSON information into a yarn file.

This would create a conflict with the existing meaning of a comment at the top of the file. This comment can contain, say, a project description; or a copyright notice / license.

I'm not sure it's a good idea, since it seems to create a limitation that there will be only one file where variables can be declared? A project may want to declare variables across multiple files, for example one file per area, or per chapter, or per level, or per expansion pack.

I was going under the assumption that upgrading from the v1 to v2 would move the embedded yarn file into its own standalone file, but the feature to declare yarn variables in the inspector would still be there. That field would just point to that file, making it easier to write inspector defined variables to the right yarn file, and not hardcoding a specific yarn file name to the yarn project for it. Unless I'm misunderstanding something you would still be able to declare variables in yarn files normally, this would just be for inspector declared variables (and whatever equivalent for other engines if they so choose to use it).

This would create a conflict with the existing meaning of a comment at the top of the file. This comment can contain, say, a project description; or a copyright notice / license.

The embedded JSON could be moved to the bottom of the file instead, but what I was going for was something adjacent to Front Matter for markdown files. In that example anything between the // --- comments would be extracted and parsed as JSON. The JSON itself doesn't need to be in actual comments, any sort of syntax that makes it easy to extract and to clearly denote that this isn't a normal yarn file would be enough. What I have written just has the added benefit of having some backwards compatibility with the current v1 Unity spec, in that the comments would just be ignored.

Thought I'd clear up some issues we realised aren't correctly explained in the original post and then give some more thoughts.

Configuration format

As to the discussion around JSON vs YAML vs TOML vs etc vs etc vs etc, at this point I am gonna say let's just stick with JSON for the discussion of the project file and its features and needs. Once we are happy with the core idea and ironed out any concerns, we can return to format discussions. Just arguing about formats will rapidly become counter productive to discussing the issue at hand, our goal here isn't to pick the perfect structured document, its to move away from a unity meta file to something other tools can use. Do want to say that we won't be supporting more than one format, no matter what it ends up being, having more than one is just extra work for no real benefit.

Project File Format

We realise now it wasn't clear that the name of the file format was to be .yarnproject not the file. So you'd make your yarn project and name it something like MyGreatGame.yarnproject and the idea is you could have as many of these as you need based on your needs. You might have one for the main story and then another full of background ambient dialogue.

Project-level Variable Declarations.

With the current system the idea behind variable declarations in the project was to resolve ambiguity in the yarn files themselves, not for making a centralised location of variable declarations. We were thinking that moving ahead in the editor you'd still see all variables, both the explicit and implicitly declared ones, and any ambiguity would now become errors that you/the team would fix with declarations in the most specific-to-your-game-needs yarn file instead of done in the editor window. Happy to be wrong on this, if people are really into declaring variables in the project file interface that is fine, and we can keep that. But it still wouldn't need to be a specific field in the project, just a unity managed adjacent yarn file that is linked into the compilation. And because its just another yarn file it can be explicitly referenced in the sourceFiles field to make it clear that there is something slightly different about it.

Closing Thoughts

Finally, almost as an aside, I don't think you'd ever really be making these yourself but you'd make them in a tool-assisted manner as people should never be hand-writing configuration files. On the Unity side ideally users wouldn't even realise anything has changed, tools like ysc would gain the ability to make and verify these files for you.

Something that just occurs to me, should we have some sort of compiler options setting in here? So sorta like the following:

{
    "projectFileVersion": 2,
    "sourceFiles": ["**/*.yarn"],
    "baseLanguage": "en",
    "compilerOptions": {
        "requireDeclarations": true
    }
}

The only flag I can think of is requireDeclarations which would be to force the compiler to throw errors if there are any implicit variables in the project.

Opening the door to compiler options in this project sounds great!

An issue I've ran into using ysc (and i'm quite new to this and have never used yarn spinner in Unity at all) is that the -Lines.csv uses full system paths. That's probably not a huge deal for an engine to work around, but it's not very ideal for a system path like that to be marked even in the the ysc output.

Moving to this "root configuration" format would be wonderful since then outbound filepaths could be made relative to that root, rather than absolute, which would obfuscate the creator's file paths. That sounds minor, but as far as I can tell, it means that a project can't actually use the ysc with a scrubber running afterwards, because every compile would cause a full Git change.

The obvious structure to me for any project with serious yarn work would look like:

yarn_src/
yarn_out/
.yarnconfig

I say that to differentiate it from a model where the .yarnconfig is intermixed in the same directory as the source files. I'm sure any configuration COULD support that, but it does sound awfully messy to me.

requireDeclarations compiler option is too ambiguous. Because one could potentially require declarations for:

variables,
characters,
functions,
commands,
markup.

The exact specifications of what compiler options could be desired can be worked out later -- simply having a place though where one can put any compiler options would be desirable

This was implemented in Yarn Spinner v2.3.1. Accordingly, I'm closing this proposal. Thanks for everyone's contributions to the discussion!

YarnSpinnerTool / YarnSpinner