YarnSpinnerTool / YarnSpinner

Yarn Spinner is a tool for building interactive dialogue in games!
https://yarnspinner.dev
MIT License
2.3k stars 201 forks source link

Proposal: JSON-based .yarnproject files #355

Closed desplesda closed 12 months ago

desplesda commented 1 year ago

Introduction

This proposal introduces a new format for representing Yarn Projects, which control how a collection of Yarn scripts are compiled and potentially configured for use in a game engine.

Current Format

Yarn Projects currently store data in two related files:

Data stored in .yarnproject.meta files is not available outside of Unity.

Motivation

  1. Yarn Projects store their data in Yarn Spinner script format. This means that to get information out of them, they must be compiled and analysed. It's not impossible to store project-wide data like localisation information in this format (using, for example, node headers, file headers, or specially structured commands, lines or comments), but storing structured data isn't what the Yarn language was designed for.

  2. Currently, there's lots of information that the Yarn Spinner compiler needs to correctly compile a project that is only available to Unity projects. Crucially, this includes the list of .yarn scripts that need to be compiled. This means that tools, like the Visual Studio Code extension, ysc, and the Language Server, have to make assumptions about what to compile, which may not be correct.

    This data is not inaccessible, as the .meta files are YAML that may be parsed, but certain information like file references are stored as GUIDs, which require lots of work to dereference when not running inside Unity (and the AssetDatabase is not available to be queried.)

    Additionally, this means that it is even more difficult to author this information in a way that works for both the Unity Editor and tools outside of Unity.

    Moving this data out of the domain of Unity and into more readily readable and interpretable will make it easier for non-Unity tools to work with it.

Inspiration

.csproj files contain structured data in the form of XML that describes the build process for a C# project, including package dependencies, compilation steps, and other configuration files. By default, all C# files in the same directory and subdirectories of the .csproj file's location are included in the compilation. This behaviour can be augmented or disabled. The structure of the XML is highly hierarchical and nested.

.tsconfig files contain structured data in the form of JSON, and effectively serves as a shorthand for passing flags to the tsc compiler binary. By default, all TypeScript files in the current directory or its subdirectories are included. The structure of the JSON is very flat.

The proposed new format takes the following inspiration from these two formats:

JSON vs Other Formats

JSON is proposed here, but is not essential. The reasoning of JSON over other formats (XML, YAML, custom format) is:

Proposed New Format

.yarnproject files take the form of a JSON file.

{
    "projectFileVersion": 2,
    "sourceFiles": ["**/*.yarn"],
    "baseLanguage": "en",
    "localisation": {
        "en": {
            "assets": "../LocalisedAssets/English"
        },
        "de": {
            "strings": "../German.csv",
            "assets": "../LocalisedAssets/German"
        }
    },
    "definitions": "Functions.ysls.json"
}

Properties:

Hints in localisation.[LANGUAGE].strings and localisation.[LANGUAGE].assets are strings that are formatted in one of the following ways:

When using Yarn Internal Localisation, localisation.[LANGUAGE].strings is used to locate the .csv file containing the localised text for lines, and localisation.[LANGUAGE].assets is used to locate the folder containing localised assets for each line.

When using Unity Localisation, localisation.[LANGUAGE].strings is used to locate an optional .csv file containing the localised text for lines to pre-populate the string table, and localisation.[LANGUAGE].assets is used to locate the folder containing localised assets for each line to pre-populate import settings for loading an asset table.

The decision to use Yarn Internal or Unity Localisation is stored in the .meta file. The justification for this is that this is exactly what import settings in Unity are for: Unity-specific configuration of how an asset should be imported.

In previous versions of the .yarnproject format, variables could be defined in the text of the file. This is not supported in the new format; all variables are defined in .yarn source files.

Comments are permitted in the file, as per the JSONC flavour of JSON (which adds support for C-style single-line and multi-line comments, // and /* */).

Upgrade process

The following steps are followed to upgrade a file from version 1 (Yarn script) .yarnproject files to version 2 (JSON) files.

To convert a v1 file to a v2 file:

Describe the impact that your solution will have on code written in the most recent shipping version of the language. If your proposed changes mean that existing code would need to be changed in order to work, describe in detail what changes would be required, and describe an algorithm (pseudocode is fine) for detecting where these changes are necessary, and how an automated upgrader would either make changes or flag that a human must make changes.

Alternatives considered

The first alternative would be to make no changes, and leave most Yarn Project data remaining in Unity. This restricts most functionality to the Unity game engine, or would necessitate building tools that can interpret and create the .meta files that Unity creates.

Another alternative would be to store project data in the current .yarn format, either in some imperative or declarative way. This would have the same level of functionality as JSON, but would require building additional custom parsing and/or interpretation logic in Yarn Spinner.

A third alternative would be to do away with Yarn Projects altogether, and to require scripts to be grouped manually by the user. This is what Yarn Spinner did prior to Yarn Spinner 2.0, and it was very cumbersome.

Acknowledgments

@McJones contributed design review on this document.

KXI-System commented 1 year ago

Overall I approve of this new format, but the real pain points to adopting the new format would be the transition of how to handle default variables/variables declared "in the yarn project". The current Unity only solution of tying needed variable declarations to a yarn project is really seamless, many user's don't even realize it's just a yarn file with declare statements.

Since the new JSON format relies on having another yarn file to handle needed variable declarations, it might lead to confusion with this generated yarn file or unnecessarily messy project setups mangling variable.yarn files. Ideally this wouldn't be an issue if most projects adopted better practices when declaring yarn variables, but projects would need extra care and attention when upgrading their projects even with a robust upgrader. I'm not sure how much of an issue this will be however, though users would need to know when and how to declare variables in their projects which the current documentation lacks.

One question I want to ask is who would be editing this new yarn project and how. The current Unity yarn project could only really be edited with Unity's inspector which is a really nice interface for non technical users to work with (sure you can edit the .meta file directly but who does that?). I'm assuming the inspector workflow would still be there, but for other engines like Unreal would that require similar tooling or would the user be expected to edit the JSON file? JSON is human readable and fairly easy to edit for technical folks, but its really not that nice to work with compared to other formats like YAML.

A non technical user would either have to learn and wrangle "scary looking" JSON, or engine integrators would have to spend extra work making suitable tools to manage yarn projects for them. Even though its not a major concern, it would be a missed opportunity to not at least try to make the new yarn project user experience pleasant enough for a non-technical user to use if they needed or have to.


Here are my prosed additions:

  1. Have a field specifically for the "variables.yarn" source yarn file. This would be treated as other source yarn files, but would be easier to track down in case something goes wrong or it needs to be written to.

  2. Have yarn projects be a spec that could support YAML or TOML, but the YarnSpinner interpreter would only accept it in JSON format. This can allow developers who want to write yarn projects in a friendlier format the ability to easily convert it to the JSON format, assuming they also add the pipeline to do so themselves.

  3. Embed the yarn project JSON information into a yarn file. This will require additional work to extract it, but significantly less compared to extending the yarn file header to accommodate the information needed. It might look like this:

// Yarn Project Config
// ---
// {
//     "projectFileVersion": 2,
//     "sourceFiles": ["**/*.yarn"],
//     "baseLanguage": "en",
//     "localisation": {
//         "en": {
//             "assets": "../LocalisedAssets/English"
//         },
//         "de": {
//             "strings": "../German.csv",
//             "assets": "../LocalisedAssets/German"
//         }
//     },
//     "definitions": "Functions.ysls.json"
// }
// ---

title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===

Since really yarn header's can't do stuff like large arrays or key-value pairs easily, those could remain in the JSON realm and everything else could be moved over.

// Yarn Project Config
// ---
// {
//     "sourceFiles": ["**/*.yarn"]
//     "localisation": {
//         "en": {
//             "assets": "../LocalisedAssets/English"
//         },
//         "de": {
//             "strings": "../German.csv",
//             "assets": "../LocalisedAssets/German"
//         }
//     }
// }
// ---

projectFileVersion: 2
baseLanguage: en
definitions: Functions.ysls.json
title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===

The embedded json could be linked externally as well, not sure how useful that would be but it could work.

projectFileVersion: 2
baseLanguage: en
definitions: Functions.ysls.json
config: MyYarnProject.json
title: Program
---
// Ship.yarn, node Ship, line 14
<<declare $should_see_ship = false>>

// Ship.yarn, node Ship, line 14
<<declare $sally_warning = false>>
===
st-pasha commented 1 year ago

Agree with @KXI-System that YAML is a friendlier format from user's standpoint (even though it may be slightly harder to parse -- but then, yaml libraries are widely available anyways).

I would also suggest that the project file itself was called yarn.yaml instead of .yarnproject:

Have a field specifically for the "variables.yarn" source yarn file.

I'm not sure it's a good idea, since it seems to create a limitation that there will be only one file where variables can be declared? A project may want to declare variables across multiple files, for example one file per area, or per chapter, or per level, or per expansion pack.

Embed the yarn project JSON information into a yarn file.

This would create a conflict with the existing meaning of a comment at the top of the file. This comment can contain, say, a project description; or a copyright notice / license.

KXI-System commented 1 year ago

I'm not sure it's a good idea, since it seems to create a limitation that there will be only one file where variables can be declared? A project may want to declare variables across multiple files, for example one file per area, or per chapter, or per level, or per expansion pack.

I was going under the assumption that upgrading from the v1 to v2 would move the embedded yarn file into its own standalone file, but the feature to declare yarn variables in the inspector would still be there. That field would just point to that file, making it easier to write inspector defined variables to the right yarn file, and not hardcoding a specific yarn file name to the yarn project for it. Unless I'm misunderstanding something you would still be able to declare variables in yarn files normally, this would just be for inspector declared variables (and whatever equivalent for other engines if they so choose to use it).

This would create a conflict with the existing meaning of a comment at the top of the file. This comment can contain, say, a project description; or a copyright notice / license.

The embedded JSON could be moved to the bottom of the file instead, but what I was going for was something adjacent to Front Matter for markdown files. In that example anything between the // --- comments would be extracted and parsed as JSON. The JSON itself doesn't need to be in actual comments, any sort of syntax that makes it easy to extract and to clearly denote that this isn't a normal yarn file would be enough. What I have written just has the added benefit of having some backwards compatibility with the current v1 Unity spec, in that the comments would just be ignored.

McJones commented 1 year ago

Thought I'd clear up some issues we realised aren't correctly explained in the original post and then give some more thoughts.

Configuration format

As to the discussion around JSON vs YAML vs TOML vs etc vs etc vs etc, at this point I am gonna say let's just stick with JSON for the discussion of the project file and its features and needs. Once we are happy with the core idea and ironed out any concerns, we can return to format discussions. Just arguing about formats will rapidly become counter productive to discussing the issue at hand, our goal here isn't to pick the perfect structured document, its to move away from a unity meta file to something other tools can use. Do want to say that we won't be supporting more than one format, no matter what it ends up being, having more than one is just extra work for no real benefit.

Project File Format

We realise now it wasn't clear that the name of the file format was to be .yarnproject not the file. So you'd make your yarn project and name it something like MyGreatGame.yarnproject and the idea is you could have as many of these as you need based on your needs. You might have one for the main story and then another full of background ambient dialogue.

Project-level Variable Declarations.

With the current system the idea behind variable declarations in the project was to resolve ambiguity in the yarn files themselves, not for making a centralised location of variable declarations. We were thinking that moving ahead in the editor you'd still see all variables, both the explicit and implicitly declared ones, and any ambiguity would now become errors that you/the team would fix with declarations in the most specific-to-your-game-needs yarn file instead of done in the editor window. Happy to be wrong on this, if people are really into declaring variables in the project file interface that is fine, and we can keep that. But it still wouldn't need to be a specific field in the project, just a unity managed adjacent yarn file that is linked into the compilation. And because its just another yarn file it can be explicitly referenced in the sourceFiles field to make it clear that there is something slightly different about it.

Closing Thoughts

Finally, almost as an aside, I don't think you'd ever really be making these yourself but you'd make them in a tool-assisted manner as people should never be hand-writing configuration files. On the Unity side ideally users wouldn't even realise anything has changed, tools like ysc would gain the ability to make and verify these files for you.

McJones commented 1 year ago

Something that just occurs to me, should we have some sort of compiler options setting in here? So sorta like the following:

{
    "projectFileVersion": 2,
    "sourceFiles": ["**/*.yarn"],
    "baseLanguage": "en",
    "compilerOptions": {
        "requireDeclarations": true
    }
}

The only flag I can think of is requireDeclarations which would be to force the compiler to throw errors if there are any implicit variables in the project.

sanbox-irl commented 1 year ago

Opening the door to compiler options in this project sounds great!

An issue I've ran into using ysc (and i'm quite new to this and have never used yarn spinner in Unity at all) is that the -Lines.csv uses full system paths. That's probably not a huge deal for an engine to work around, but it's not very ideal for a system path like that to be marked even in the the ysc output.

Moving to this "root configuration" format would be wonderful since then outbound filepaths could be made relative to that root, rather than absolute, which would obfuscate the creator's file paths. That sounds minor, but as far as I can tell, it means that a project can't actually use the ysc with a scrubber running afterwards, because every compile would cause a full Git change.

The obvious structure to me for any project with serious yarn work would look like:

yarn_src/
yarn_out/
.yarnconfig

I say that to differentiate it from a model where the .yarnconfig is intermixed in the same directory as the source files. I'm sure any configuration COULD support that, but it does sound awfully messy to me.

st-pasha commented 1 year ago

requireDeclarations compiler option is too ambiguous. Because one could potentially require declarations for:

sanbox-irl commented 1 year ago

The exact specifications of what compiler options could be desired can be worked out later -- simply having a place though where one can put any compiler options would be desirable

desplesda commented 12 months ago

This was implemented in Yarn Spinner v2.3.1. Accordingly, I'm closing this proposal. Thanks for everyone's contributions to the discussion!