SpyglassMC / Spyglass

Development tools for vanilla Minecraft: Java Edition data pack developers.
https://spyglassmc.com
MIT License
305 stars 32 forks source link

mcdoc refactor #955

Closed SPGoding closed 4 months ago

SPGoding commented 2 years ago

We need to invent a TypeScript-flavoured Rust on top of a Rust-flavoured nbtdoc.

Update: an early draft specification for the new mcdoc syntax can be found here.

Current status: The algorithms for major procedures are being documented.

Future notes:

Outdated information # Necessary changes These changes are necessary to give nbtdoc the ability to fully describe the NBT (and potentially JSON and CODEC) structure used by Minecraft. ## Attributes This is an idea of Yurihaia. See [Rust Attributes](https://doc.rust-lang.org/reference/attributes.html) for reference. This system will allow us to add arbitrary new metadata in the future when needed. The syntax allowed for attribute values should be flexible enough to allow arbitrary logic to be expressed. Attributes will be allowed in front of compound field definitions, types, and statements. ```nbtdoc compound CommandBlock { #[parser=minecraft:command] Command: string, // Equivalent to Command: #[parser=minecraft:command] string, // Deprecate `id(namespace:registry)` #[id=minecraft:block_entity_type] id: string } compound Foo { UUID: ( #[until="20w10a"] string | #[since="20w10a"] int[] @ 4 ) } #[since="22w11a"] Frog describes minecraft:entity[minecraft:frog]; BlockItem describes minecraft:item[ minecraft:stone, #[since="20w49a"] minecraft:sculk_sensor ]; #[expand=true] MyCustomItem describes minecraft:item[minecraft:carrot_on_a_stick]; ``` ### Builtin Attributes | Attribute | Input | Description | | - | - | - | | `expand` | `false` \| `true` | Only meaningful when attributing a describes statement. If set to `true`, expands the definition for the type pointed to by the index registry. | | `id` | MINECRAFT_IDENT | Only meaningful when attributing a string type. Amends the type information with the resource location of a parser that should be used to parse the value of this string. | | `parser` | MINECRAFT_IDENT | Only meaningful when attributing a string type. Amends the type information with the resource location of a parser that should be used to parse the value of this string. | | `since` | STRING | Can be used by the serialize script and the nbt type checker. If the current version is earlier than the input version, the type will be changed to `()` if attributing a type, or the statement will become noop if attributing a statement. | | `until` | STRING | Can be used by the serialize script and the nbt type checker. If the current version is later than or equal to the input version, the type will be changed to `()` if attributing a type, or the statement will become noop if attributing a statement. | ## Index keys, `key` keyword for FIELD_PATH_KEY, and `any` type ```nbtdoc compound DebugProperty { // https://minecraft.fandom.com/wiki/Debug_Stick#Item_data [id(minecraft:block)]: nbtdoc:block_state_names[key] } compound Any { [string]: any } ``` # Quality-of-life changes ## Deprecate `mod` statements I think they are rather confusing and serve little purpose. It should be pretty easy to build a module tree just from the file structure – in fact, that's what the current nbtdoc package is doing already. ## Trailing commas ```nbtdoc compound Foo { bar: int, } ``` ## Type aliases This may not be implemented due to its difficulty. ```nbtdoc type Alias = (Foo | Bar); ``` ## Generics This may not be implemented due to its difficulty. ```nbtdoc compound BrainMemory { value: V, ttl: long } ``` ## Reuse types from existing compound This may not be implemented due to its difficulty. ```nbtdoc type CommandString = ::minecraft::block::commandblock::CommandBlock.Command ``` # Serialized JSONs As I do not know anything about Rust, the old [Yurihaia/mc-nbtdoc](https://github.com/Yurihaia/mc-nbtdoc) repository that depends on `nbtdoc-rs` will likely no longer be maintained. Instead, a new `SpyglassMC/mc-nbtdoc` repository will be created which will be powered by a TypeScript-backed stack. The `main` branch on the new repository will be the sole source of truth thanks to the introduction of `since` and `until` attributes, and the serialized JSON format for each version will be regenerated on every commit to the `main` branch, similar to the approach taken by `misode/mcmeta`. This would make it significantly easier to correct information for old versions, while making it easy for existing tools utilizing the serialized JSONs to migrate – ideally they would only need to change the URL from `Yurihaia/mc-nbtdoc` to `SpyglassMC/mc-nbtdoc`.
NeunEinser commented 2 years ago

Alternate syntax

I like to see sharp, so lemme throw C# style attributes in here. [AttributeName(property1="foo", property2=91)] or [AttributeName("foo", 91)] when using ordered parameters.

See also https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/attributes/#attribute-parameters

Attribute parameters

Many attributes have parameters, which can be positional, unnamed, or named. Any positional parameters must be specified in a certain order and cannot be omitted. Named parameters are optional and can be specified in any order. Positional parameters are specified first. For example, these three attributes are equivalent:

[DllImport("user32.dll")]
[DllImport("user32.dll", SetLastError=false, ExactSpelling=false)]
[DllImport("user32.dll", ExactSpelling=false, SetLastError=false)]

The first parameter, the DLL name, is positional and always comes first; the others are named. In this case, both named parameters default to false, so they can be omitted. Positional parameters correspond to the parameters of the attribute constructor. Named or optional parameters correspond to either properties or fields of the attribute. Refer to the individual attribute's documentation for information on default parameter values.

I personally don't like @attributeName("foo", property1=54) The rust style looks good imo.

NeunEinser commented 2 years ago

Also I feel like there is a case for deprecating the type id(namespace:identifier) and the registry type and instead using an attribute for that as well:

compound InventoryItem {
    /// The number of items in the stack
    Count: byte,
    /// The id of the item
    #[id=minecraft:item]
    id: string,
    /// The NBT of the item
    #[registryIdentifier=id]
    tag: ItemBase
}

I think it's weird to reference registries like that which to me seem like they aren't even properly defined in nbtdoc(?). I only found ItemBase describes minecraft:item in Yurihaia/mc-nbtdoc, but nothing like Compass describes minecraft:item[minecraft:compass]

SPGoding commented 2 years ago

The mc-nbtdoc syntax is well defined at https://github.com/Yurihaia/nbtdoc-rs/blob/master/docs/format.md, although it's not really in a user-friendly format. We will probably need to provide a documentation on our website.

And for deprecating id, Yurihaia had that idea as well and your proposed attribute looks pretty clean, so we will probably go with it.

But I don't think the proposed registry attribute can fit all use cases:

compound BlockState {
    /// The id of the block
    Name: id(minecraft:block),
    /// The properties of the block
    Properties: custom:blockstates[Name] // What should the base compound here be?
}

I think the current registry syntax is fine, and could even be expanded to have a static way to index something, e.g.

compound Foo {
    Properties: custom:blockstates @ minecraft:stone
}
SPGoding commented 2 years ago

And although you can see sharp through [AttributeName()], it has ambiguity with list type ([]) when being parsed and requires some backtracking, while the #[attribute_name=] and @attributeName() syntax are unambiguous due to starting with unique, special character:

compound Foo {
    Bar: [id=minecraft:block] string // When parsing `[` the parser could parse it as either a list or an attribute.
}
NeunEinser commented 2 years ago

But I don't think the proposed registry attribute can fit all use cases:

compound BlockState {
  /// The id of the block
  Name: id(minecraft:block),
  /// The properties of the block
  Properties: custom:blockstates[Name] // What should the base compound here be?
}

custom:blockstates? Wtf? Where do these registries come from? Where are they defined?

The base type is essentially a map from string to string, with keys and values coming from the registry. I don't know if nbtdoc has a proper map type, but in this case it would be the logical base.

Since I can't find anything about a map type in nbtdoc I am just gonna invent my own.

compound BlockState {
    /// The id of the block
    Name: id(minecraft:block),
    /// The properties of the block
    #[registry=custom:blockstate, key=id] // dunno if this is the correct way to define parameters of rust attributes. Registry is the attribute name, key is a parameter for the attribute.
    Properties: map @ string
}

I think the current registry syntax is fine, and could even be expanded to have a static way to index something, e.g.

compound Foo {
  Properties: custom:blockstates @ minecraft:stone
}

You could allow for either defining a key or an entry directly. Still think this is cleaner.

compound Foo {
    #[registry=custom:blockstate, entry=minecraft:stone]
    Properties: map @ string
}

And yes, I saw the documentation for nbtdoc.

NeunEinser commented 2 years ago

And although you can see sharp through [AttributeName()], it has ambiguity with list type ([]) when being parsed and requires some backtracking, while the #[attribute_name=] and @attributeName() syntax are unambiguous due to starting with unique, special character:

compound Foo {
  Bar: [id=minecraft:block] string // When parsing `[` the parser could parse it as either a list or an attribute.
}

I don't see why you are allowing attributes at that place. Just don't do that and you're fine 😛

SPGoding commented 2 years ago

A list of registries is defined here.

A map type syntax is proposed in this proposal as there unfortunately isn't one:

compound StringToState {
   [string]: (
      #[serialize_to=true] // An attribute I made up on spot to indicate that although Minecraft accepts all types as state input, it only serializes into string.
      string | 
      #[serialize_to=false]
      (byte | short | int | long | float | double)
   )
}

And attributes need to be allowed before types to allow them to be used inside unions, like the example above.

NeunEinser commented 2 years ago
compound StringToState {
   [string]: (
      #[serialize_to=true] // An attribute I made up on spot to indicate that although Minecraft accepts all types as state input, it only serializes into string.
      string | 
      #[serialize_to=false]
      (byte | short | int | long | float | double)
   )
}

And attributes need to be allowed before types to allow them to be used inside unions, like the example above.

No matter the attribute style, I think this is cleaner.

compound StringToState {
   #[serialize_to=string]
   [string]: (string | byte | short | int | long | float | double)
}

And yeah, I don't like your proposal of allowing attributes on types, and have never seen that in any language.

I think mine is easier to mentally parse what's going on (and probably also programmatically easier to parse).

NeunEinser commented 2 years ago

Another thing I didn't like about nbtdoc as it stands currently: it uses a different path scheme compared to other resources in minecraft.

How about ./nbtdoc/namespace/path/to/nbtdoc/file.nbtdoc And then using namespace:path/to/nbtdoc/file or without using compund foo {bar: namespace:path/to/nbtdoc/file.MyCompound}?

SPGoding commented 2 years ago

Another thing I didn't like about nbtdoc as it stands currently: it uses a different path scheme compared to other resources in minecraft.

How about ./nbtdoc/namespace/path/to/nbtdoc/file.nbtdoc And then using namespace:path/to/nbtdoc/file or without using compund foo {bar: namespace:path/to/nbtdoc/file.MyCompound}?

That is something I've been thinking about as well. I think the original ident path format was inspired by Rust, but as the doc is supposed to be mostly used by not-so-hardcore mcfunction "programmers", having a more consistent path format as other pack resources would be great.

One advantage of the special path format, however, is that it's clear it is a separate system from Minecraft itself. It may still bring confusion to users if they need to import (use, using, etc.) files in mcdoc but not in other resources when they are all using the resource location format.

SPGoding commented 2 years ago
compound StringToState {
   #[serialize_to=string]
   [string]: (string | byte | short | int | long | float | double)
}

And yeah, I don't like your proposal of allowing attributes on types, and have never seen that in any language.

I think mine is easier to mentally parse what's going on (and probably also programmatically easier to parse).

Would the definition for UUIDs become something like this then?

compound Spg {
    // Type is `string` before 20w10a, `int[] @ 4` at and after 20w10a, and can be fixed by the `mcdoc:nbt/uuid/string_to_byte_array` function
    // There will be an auto generated file listing all functions mentioned by the mcdoc project, and implementations of the mcdoc environment can provide actual implementations for those functions.
    #[versioned(string, "20w10a", mcdoc:nbt/uuid/string_to_byte_array, int[] @ 4)]
    Owner: string | int[] @ 4
}

I think it's definitely cleaner than a union and easier to understand for humans, but string and int[] @ 4 are repeated in the attribute and in the actual type.

misode commented 2 years ago

I think it's definitely cleaner than a union and easier to understand for humans, but string and int[] @ 4 are repeated in the attribute and in the actual type.

I disagree, the union type was perfectly fine to understand imo. Allowing attributes on types like string follows the same rule as allowing attributes in front of compounds.

NeunEinser commented 2 years ago

I think it's definitely cleaner than a union and easier to understand for humans, but string and int[] @ 4 are repeated in the attribute and in the actual type.

I disagree, the union type was perfectly fine to understand imo. Allowing attributes on types like string follows the same rule as allowing attributes in front of compounds.

Compounds are named, types are not. You can also add attributes to C#/Java classes, but not types.

And yes, you can unserstand the union type, but it's harder to read.

SPGoding commented 2 years ago

Unlike C#/Java, everything in nbtdoc is a type, as the whole purpose of this language is to define data schemas. Attribute is a way to amend arbitrary data to the type system, so it makes sense to me to allow them in front of any types, whether named or anonymous.

I would also say readability is rather subjective, and your proposed syntax here requires the attribute parser to understand a type is expected after serialize_to=, while I would prefer the attribute parser to not be concerned about the context but to just blindly parse an arbitrary tree-like structure for simplicity. The proposed syntax also has the string type repeated twice, violating the "don't repeat yourself" principle.

compound StringToState {
   #[serialize_to=string]
   [string]: (string | byte | short | int | long | float | double)
}
NeunEinser commented 2 years ago

[Y]our proposed syntax here requires the attribute parser to understand a type is expected after serialize_to=, while I would prefer the attribute parser to not be concerned about the context but to just blindly parse an arbitrary tree-like structure for simplicity.

It has to be aware of the type that follows anyway, tho?

Builtin Attributes

Attribute Input
expand false | true
id MINECRAFT_IDENT
parser MINECRAFT_IDENT
since STRING
until STRING

All of these have types that the parser has to check, why not have a type input?


The proposed syntax also has the string type repeated twice, violating the "don't repeat yourself" principle.

I disagree. Repeating would be that you somehow have to specify the serialization output twice. You are not repeating yourself here, you are saying two distinct things ("This type can be set as string" and "This type will always be a string when serialized")

This one is even more redundant as a parameterless serialize_to would suffice, and you don't need to specify serialize_to=false. Just pointing that out, ofc that is not necessary for your proposal, my point is just that I find it funny that you see redundancy in my proposal where I see none, while your own proposal has some :D

compound StringToState {
   [string]: (
      #[serialize_to=true] // An attribute I made up on spot to indicate that although Minecraft accepts all types as state input, it only serializes into string.
      string | 
      #[serialize_to=false]
      (byte | short | int | long | float | double)
   )
}

I would also say readability is rather subjective

Objective differences which in my objectively good subjective opinion make my proposal more readable:

Things I see in favor of your proposal:

NeunEinser commented 2 years ago

The proposed syntax also has the string type repeated twice, violating the "don't repeat yourself" principle.

I suppose this one would be a better example if you really want to implement versioning into this (tho I don't get the point of specifying versioning in the doc, as with everything else you just download different tags/branches/commits depending on version).

compound versioningIsEvil {
    evilField: (#[until=1.17] int | #[since=1.17] long)
}
compound versioningIsEvil {
    #[until=1.17]
    evilField: int

    #[since=1.17]
    evilField: long
}

In this case I would say neither is really good or ideal. I still heavily dislike that it's hard to look at the first and see what type it actually is, and for the second one it's hard to see that those two are actually about the same field, rather than two separate fields. And yeah, you'd also need to repeat doc and potentially other things about the type as well, which certainly isn't ideal.

Something like this could be okay-ish, as you can at least see the current version's type easily, but I would assume it makes parsing way harder, if attributes are allowed to mess with type definitions like that.

compound versioningIsEvil {
    #[type_override=int, until=1.17]
    evilField: long
}

So if you use my proposal I would strongly advice against until/since attributes and rely on branches of the source repository instead (and tbh I would advice for that anyway, but that's a different discussion to have)

misode commented 2 years ago

I prefer not using git for different mc versions and instead use something like what is proposed:

SPGoding commented 2 years ago

Justification for versioning in doc

As the current sole maintainer of mc-nbtdoc I can subjectively tell you that it is subjectively painful to maintain different versions of the doc on different branches/tags, particularly when there is a mistake made for an old version -- the maintainer would need to cherry-pick the fix commit and regenerate the serialized JSON format for all affected versions, which although can theoretically be automated, is subjectively messy and subjectively inelegant.

Versioning in doc is also what Misode did for the JSON schemas part in Spyglass, and what Misode does must be objectively supreme, therefore versioning in the doc must be objectively supreme by universal instantiation.

And then I got ninja'd

Attribute syntax

I think we are now bike-shedding on specific details of some attributes. To delay those arguments I've made the proposed specification flexible enough to allow most of the proposed attribute usage mentioned under this issue. Thanks for providing those examples!

NeunEinser commented 2 years ago

@misode @SPGoding Thanks for the explanation, I have been sufficiently convinced 😁