BenWoodworth / knbt

Kotlin NBT library for kotlinx.serialization
GNU Lesser General Public License v3.0
72 stars 2 forks source link

Re-design root class name NbtCompound-nesting functionality #29

Open BenWoodworth opened 1 year ago

BenWoodworth commented 1 year ago

Problem

Currently (v0.11) classes are serialized within a single-entry NbtCompound, with the serial name as the key. But, only if serialized at the root.

This behavior is convenient, but has a couple issues:

Old redesign

This issue's initial redesign has been implemented, but is being replaced by the new design proposed in this comment

Original solution - Remove the special conditional logic for adding serial names at the root, so data serializes the same regardless of where in the serialization process/data it is - This makes deserializing based on the structure more straightforward - Add an annotation that, when applied to a serial descriptor, instructs the serializer unconditionally to nest the data in a single-entry `NbtCompound` - This covers the use case of top-level NBT binary/files needing to be a named tag (single-element NbtCompound) - From the docs: > An NBT file consists of a single GZIPped Named Tag of type TAG_Compound. - So `@NbtNamed` might be a good annotation name. `@NbtRoot` and `@NbtFile` have been used in earlier versions of knbt, but they seem too specific ### Tasks - ☑ `StructureKind.CLASS`/`OBJECT` - Covers the previous default `class` nesting using the serial name - ☐ `StructureKind.MAP` - ☐ `StructureKind.LIST`

New Design

This design aims to add a concept of NBT names that all serializable types have, with every type either:

The initial implementation will mainly focus on implementing static naming, since most use cases don't need logic around the serialized NBT names, and it's also not clear yet how an API for dynamic names should behave. See the dynamic names section below for details.

For the scope of this initial design, the NBT name only applies to the root tag name. Though in the future, especially with dynamic names, it's possible the NBT name could apply more broadly. (e.g. a value within an NBT compound knowing its own element name while deserializing)

Use Cases

These are use cases that are being designed for in this new approach

Default use, without using any named NBT API

Statically setting the NBT root name for a type

Inspecting NBT (including root name) by decoding to an in-memory representation

val decodedNbt: NbtNamed<NbtTag> = nbt.decodeFromBufferedSource(source)

Inspecting how data is encoded to NBT (including root name) through an in-memory representation

val encodedNbt: NbtNamed<NbtTag> = nbt.encodeToNamedNbtTag(data)

Named-root NBT variants

Only some variants of NBT have root names encoded. With this new design, serializing values should be the same between named and unnamed root variants (instead of named variants being modeled as nested in an NBT compound).

Unnamed root variants:

Static NBT names for all serializable types

Dynamically serializing NBT names, and future encoding/decoding API

Add a basic NbtNamed class for an in-memory representation of the (root) NBT value and its name.

Care was taken when deciding how this would work, making sure there's room for full-blown dynamically serialized names to be added later in a forward-compatible way.

WinX64 commented 11 months ago

Another issue with the way things are done at the moment is related to change I mentioned on https://github.com/BenWoodworth/knbt/issues/13#issuecomment-1747769762.

This recent change in 23w40a will allow not only Compound Tags, but also String Tags to be encoded as root tags. If we were to generalize this, any type of tag could potentially be encoded as a root tag.

BenWoodworth commented 11 months ago

I don't think that should actually be an issue with the way I have it implemented now for this issue. You'd need a custom serializer since you can't e.g. annotate a string with @NbtNamed, but adding the annotation to the serializer descriptor like this should work:

object RootStringSerializer : KSerializer<String> {
    override val descriptor: SerialDescriptor =
        object : SerialDescriptor by PrimitiveSerialDescriptor("RootStringSerializer", PrimitiveKind.STRING) {
            @ExperimentalSerializationApi
            override val annotations: List<Annotation> = listOf(NbtNamed("root string name"))
        }

    override fun serialize(encoder: Encoder, value: String): Unit =
        encoder.encodeString(value)

    override fun deserialize(decoder: Decoder): String =
        decoder.decodeString()
}
BenWoodworth commented 11 months ago

What is interesting now though. I was debating between @NbtNamed and @NbtRoot and landed on @NbtNamed. But with #36 being a thing now, I'm probably going to go with @NbtRoot instead since that breaks my assumption about a named tag (which is a compound with one named entry according to the spec) and the root being the same thing.

WinX64 commented 11 months ago

Yes, the issue I'm trying to point out is that very assumption (named tags are compounds with one entry), which in my view at least has always been more of a convention, and not a hard limitation of the NBT format itself, even before the change in 23w40a. It's making very cumbersome to perform some operations that should've been trivial (writing a simple Stringtag should not require all that work mentioned on https://github.com/BenWoodworth/knbt/issues/29#issuecomment-1751852009).

I will open a new issue detailing everything (or possibly edit #36) as to not get too off-topic. The point I'm trying to make isn't really related to root class names, but as to how the root tag itself is structured/treated during serialization/deserialization.

BenWoodworth commented 11 months ago

As far as discussing root tags and how they're structured/treated (and the mental model for it in general), this issue's probably the best place to discuss that. (Instead of a new issue, since the structure/treatment is exactly what this issue is about).

And your issue for nameless root tags could be a discussion of how the new variant fits into that mental model. I have a few ideas after sitting on it yesterday, so I'll drop a comment over there too.

BenWoodworth commented 11 months ago

And I also 100% agree that it's cumbersome. The RootStringSerializer example just serves to demonstrate it's possible with the one NbtNamed line in the current encoding. And if needed, it can all be neatly wrapped into a general class from knbt, e.g. NbtNamedValue<String>("name", "value")

WinX64 commented 11 months ago

Awesome, I will gather my ideas and make a post here when I have some time then. Some of the points are still more aligned with #36, so I will post those there.

BenWoodworth commented 4 months ago

Partially in reply to your comment on #36:

As for the named tag, it was more of a suggestion. If you already considered it at some point, and still decided doing it another way, I don't see any point on insisting on it.

I've thought about this more than I'd like to admit (😅), wanting to find a solution for:

And I had a couple misconceptions when I originally considered how names should be modeled (and in all my messages before this):

A new approach that I think will work

(Which replaces the approach outlined in this issue's description)

New Design

(moved to the issue description)

KP2048 commented 4 months ago

A way to serialize a class without a root tag and just have everything be a flat compound tag would be nice. Using this in actual Minecraft modding I have run into many scenarios where I don't want a root tag because it gets embedded in another bit of nbt.

pandier commented 4 months ago

A way to serialize a class without a root tag and just have everything be a flat compound tag would be nice. Using this in actual Minecraft modding I have run into many scenarios where I don't want a root tag because it gets embedded in another bit of nbt.

I agree. Having like a encodeToByteArrayFlat or encodeToByteArrayRaw that would just encode the payload without the root tag would be useful.

KP2048 commented 4 months ago

Right now I do this

NBT.encodeToNbtTag(serializer, input).nbtCompound[serializer.descriptor.serialName]!!

It's kotlin, because I write my Minecraft mods in Kotlin, but having an encodeToNbtTagFlat would be awesome

KP2048 commented 4 months ago

I also have some stuff for converting between the knbt representation and the Mojmap representation if anyone needs that:

(expand code) ```kotlin @OptIn(ExperimentalContracts::class) inline fun minecraftNbtCompound(builderAction: NbtCompoundBuilder.() -> Unit): CompoundTag { contract { callsInPlace(builderAction, InvocationKind.EXACTLY_ONCE) } return buildNbtCompound(builderAction).toMinecraft } @OptIn(ExperimentalTypeInference::class, ExperimentalContracts::class) inline fun minecraftNbtList( @BuilderInference builderAction: NbtListBuilder.() -> Unit, ): ListTag { contract { callsInPlace(builderAction, InvocationKind.EXACTLY_ONCE) } return buildNbtList(builderAction).toMinecraft } val NbtTag?.toMinecraft: Tag get() = when (this) { null -> EndTag.INSTANCE is NbtByte -> ByteTag.valueOf(value) is NbtByteArray -> ByteArrayTag(this) is NbtCompound -> toMinecraft is NbtDouble -> DoubleTag.valueOf(value) is NbtFloat -> FloatTag.valueOf(value) is NbtInt -> IntTag.valueOf(value) is NbtIntArray -> IntArrayTag(this) is NbtList<*> -> toMinecraft is NbtLong -> LongTag.valueOf(value) is NbtLongArray -> LongArrayTag(this) is NbtShort -> ShortTag.valueOf(value) is NbtString -> StringTag.valueOf(value) } val NbtCompound.toMinecraft: CompoundTag get() = CompoundTag().apply { mapValues { it.value.toMinecraft }.forEach { (key, value) -> put(key, value) } } val NbtList<*>.toMinecraft: ListTag get() = ListTag().apply { this@toMinecraft.map { it.toMinecraft }.forEach { add(it) } } val Tag.fromMinecraft: NbtTag? get() = when (id.toInt()) { 0 -> null 1 -> NbtByte((this as NumericTag).asByte) 2 -> NbtShort((this as NumericTag).asShort) 3 -> NbtInt((this as NumericTag).asInt) 4 -> NbtLong((this as NumericTag).asLong) 5 -> NbtFloat((this as NumericTag).asFloat) 6 -> NbtDouble((this as NumericTag).asDouble) 7 -> NbtByteArray((this as ByteArrayTag).asByteArray) 8 -> NbtString(this.asString) 9 -> (this as ListTag).fromMinecraft 10 -> (this as CompoundTag).fromMinecraft 11 -> NbtIntArray((this as IntArrayTag).asIntArray) 12 -> NbtLongArray((this as LongArrayTag).asLongArray) else -> throw IllegalStateException("Unknown tag type: $this") } val CompoundTag.fromMinecraft: NbtCompound get() = buildNbtCompound { this@fromMinecraft.allKeys.associateWith { this@fromMinecraft[it]?.fromMinecraft }.forEach { (key, value) -> if (value != null) put(key, value) } } val ListTag.fromMinecraft: NbtList<*>? get() = when (elementType.toInt()) { 0 -> null 1 -> buildNbtList { forEach { add(it.fromMinecraft as NbtByte) } } 2 -> buildNbtList { forEach { add(it.fromMinecraft as NbtShort) } } 3 -> buildNbtList { forEach { add(it.fromMinecraft as NbtInt) } } 4 -> buildNbtList { forEach { add(it.fromMinecraft as NbtLong) } } 5 -> buildNbtList { forEach { add(it.fromMinecraft as NbtFloat) } } 6 -> buildNbtList { forEach { add(it.fromMinecraft as NbtDouble) } } 7 -> buildNbtList { forEach { add(it.fromMinecraft as NbtByteArray) } } 8 -> buildNbtList { forEach { add(it.fromMinecraft as NbtString) } } 9 -> buildNbtList> { forEach { add(it.fromMinecraft as NbtList<*>) } } 10 -> buildNbtList { forEach { add(it.fromMinecraft as NbtCompound) } } 11 -> buildNbtList { forEach { add(it.fromMinecraft as NbtIntArray) } } 12 -> buildNbtList { forEach { add(it.fromMinecraft as NbtLongArray) } } else -> throw IllegalStateException("Unknown tag type: $this") } ```

This is also when I realized that knbt doesn't have an equivalent of Minecraft's EndTag, so I had to use null. Unfortunately you can't have a null value in list and compound tags in knbt, so some data loss occurs

KP2048 commented 4 months ago

I also have some wrappers for using kotlinx serializers for Minecraft Codecs and Codecs as serializers

(expand code) ```kotlin open class CodecSerializer(private val codec: Codec) : KSerializer { override val descriptor: SerialDescriptor get() = NbtTag.serializer().descriptor override fun deserialize(decoder: Decoder): T { return codec.parse(NbtOps.INSTANCE, decoder.asNbtDecoder().decodeNbtTag().toMinecraft).orThrow } override fun serialize(encoder: Encoder, value: T) { encoder.asNbtEncoder().encodeNbtTag(codec.encodeStart(NbtOps.INSTANCE, value).orThrow.fromMinecraft!!) } } inline fun Codec.serializer(): KSerializer = CodecSerializer(this) ``` ```kotlin @OptIn(ExperimentalSerializationApi::class) open class KotlinCodec(private val serializer: KSerializer, private val hasRootTag: Boolean = false) : Codec { override fun encode(input: X, ops: DynamicOps, prefix: T): DataResult { val element = if (serializer.descriptor.kind == StructureKind.CLASS && !hasRootTag) NBT.encodeToNbtTag(serializer, input).nbtCompound[serializer.descriptor.serialName]!! else NBT.encodeToNbtTag(serializer, input) return DataResult.success(NbtOps.INSTANCE.convertTo(ops, element.toMinecraft)) } override fun decode(ops: DynamicOps, input: T): DataResult> { val element = NBT.decodeFromNbtTag(serializer, ops.convertTo(NbtOps.INSTANCE, input).fromMinecraft!!.let { return@let if (serializer.descriptor.kind == StructureKind.CLASS && !hasRootTag) buildNbtCompound { put(serializer.descriptor.serialName, it) } else it }) return DataResult.success(Pair.of(element, ops.empty())) } } @OptIn(ExperimentalSerializationApi::class) open class KotlinMapCodec(private val serializer: KSerializer, private val hasRootTag: Boolean = false) : MapCodec() { override fun encode(input: X, ops: DynamicOps, prefix: RecordBuilder): RecordBuilder { val element = if (serializer.descriptor.kind == StructureKind.CLASS && !hasRootTag) NBT.encodeToNbtTag(serializer, input).nbtCompound[serializer.descriptor.serialName]!! else NBT.encodeToNbtTag(serializer, input) require(element is NbtCompound) { "Class ${serializer.descriptor.serialName} does not serialize to a CompoundTag!" } return ops.mapBuilder().apply { element.nbtCompound.forEach { (key, value) -> add(key, NbtOps.INSTANCE.convertTo(ops, value.toMinecraft)) } } } override fun keys(ops: DynamicOps): Stream = serializer.descriptor.elementNames.map { ops.createString(it) }.stream() override fun decode(ops: DynamicOps, input: MapLike): DataResult { val element = NBT.decodeFromNbtTag(serializer, NbtCompound(input.entries().toList().associate { ops.convertTo(NbtOps.INSTANCE, it.first).fromMinecraft!!.nbtString.value to ops.convertTo( NbtOps.INSTANCE, it.second ).fromMinecraft!! }).let { return@let if (serializer.descriptor.kind == StructureKind.CLASS && !hasRootTag) buildNbtCompound { put(serializer.descriptor.serialName, it) } else it }) return DataResult.success(element) } } inline fun codec(): Codec = KotlinCodec(serializer()) inline fun mapCodec(): MapCodec = KotlinMapCodec(serializer()) ```
pandier commented 4 months ago

Probably not the best place to put this stuff, you should create a gist :)

If I understand it correctly, the root tag is a feature of the binary format for type checking (you can't decode without it)? If so, then only the binary format encoders/decoders (e.g. encodeToStream) need to take the root tag into account (wrapping everything including primitives and NbtTag in a root tag). Methods like encodeToNbtTag or decodeFromString that don't use the binary format could just do without the root tag. You could also have the flat methods for binary format as mentioned earlier for encoding just the payload without the root tag. Decoding flat binary data would be unsupported.

Also are there any updates on the work? I'm open to doing some contributions.

WinX64 commented 4 months ago

Probably not the best place to put this stuff, you should create a gist :)

If I understand it correctly, the root tag is a feature of the binary format for type checking (you can't decode without it)? If so, then only the binary format encoders/decoders (e.g. encodeToStream) need to take the root tag into account (wrapping everything including primitives and NbtTag in a root tag). Methods like encodeToNbtTag or decodeFromString that don't use the binary format could just do without the root tag. You could also have the flat methods for binary format as mentioned earlier for encoding just the payload without the root tag. Decoding flat binary data would be unsupported.

Also are there any updates on the work? I'm open to doing some contributions.

So, here's some background of the entire mess:

The root tag is just a name for the outer-most NBT tag. Although the spec lightly implies that it should be a Compound, the way it is written allows it to be any type in essence. The spec itself is a mess, and was probably not thought of thoroughly back when it was defined, considering the recent changes.

The spec defines NBT tags consisting of a name and its payload, with nameless tags being considered the edge-cases (such as the values in a nbt lists). However, it's easier if you think of things in reverse, that is, as named tags being the edge-case. That way, you leave names as being specific data of Compounds, pretty much just like how Json is structured (json objects contains key value pairs). Of course, the root tag becomes the outlier in this case, since it is also still named.

Up until recently, that is how things worked, but on 23w31a the root tag name was completely dropped when encoding data on the network side, since it was completely pointless (the root tag name was always empty in every single case on the Vanilla implementations). Not only that, changes were made so any type of NBT tag can be serialized as-is. These recent changes made it so that the NBT format is structurally identical to the Json format (compounds are objects, strings are text, lists are arrays, and everything can be serialized as-is without any wrapping).

BenWoodworth commented 4 months ago

Thanks for the input and use cases! I haven't read through all the new replies just yet, but I've got a new design mostly written up, and I'm going try to finish fleshing that out and update my last comment with it later today once I'm done.

As far as what needs to be done, this issue as originally stated has already been implemented, but it still has some problems and pain points that should be addressed in the new design I'll post. After that, it shouldn't be too much to implement, since I'm expecting it to be more of a refactor effort with some small API changes/additions afterwards, being that it's a change to how NBT's modeled. So I'm not sure there's a good way to contribute, but I've got more free time now after just finishing some at-home construction the past few months, and I'll be posting updates here!

v0.12 will have a good number of API changes addressing pain points of the library, and this issue is the last big thing in the way before I get to pushing it out the release. Possibly finishing up this issue within a month if all goes smoothly :)

BenWoodworth commented 4 months ago

Updated! I'll read through the comments tomorrow when I get a chance :)

BenWoodworth commented 4 months ago

@KP2048 (comment):

A way to serialize a class without a root tag and just have everything be a flat compound tag would be nice. [...]

@pandier (comment):

I agree. Having like a encodeToByteArrayFlat or encodeToByteArrayRaw that would just encode the payload without the root tag would be useful.

@KP2048 (comment):

Right now I do this

NBT.encodeToNbtTag(serializer, input).nbtCompound[serializer.descriptor.serialName]!!

[...], but having an encodeToNbtTagFlat would be awesome

Sorry for the late reply! The new design (in the issue description) gets rid of the root compound entirely, so encodeToNbtTag/encodeToByteArray/etc. work exactly the way you want. Nameless, "raw", and no root compound to un-nest.

Then because there's no root compound naming at all anymore, there's a new (and optional) NbtNamed("name", value) for people who do still need the root name. For that, there will be encodeToNamedNbtTag, and also using NbtNamed with the existing encodeToByteArray/etc. functions.

Basically all the functions you're using now will just work, without needing to add any additional logic to deal with names. I'm going to start implementing this soon, so let me know how it sounds or if you have any questions/concerns about the new way serializing will work :)

WinX64 commented 4 months ago

Sorry for the late reply! The new design (in the issue description) gets rid of the root compound entirely, so encodeToNbtTag/encodeToByteArray/etc. work exactly the way you want. Nameless, "raw", and no root compound to un-nest.

Then because there's no root compound naming at all anymore, there's a new (and optional) NbtNamed("name", value) for people who do still need the root name. For that, there will be encodeToNamedNbtTag, and also using NbtNamed with the existing encodeToByteArray/etc. functions.

Basically all the functions you're using now will just work, without needing to add any additional logic to deal with names. I'm going to start implementing this soon, so let me know how it sounds or if you have any questions/concerns about the new way serializing will work :)

Awesome to hear that!

I do have a question regarding the new approach though. Maybe we can even use this discussion to iron things out.

Is your current idea:

  1. to have NbtNamed as a wrapper of sorts, with custom serialization logic to be handled internally by the encoder/decoder? In this case, the object could be passed directly to encodeToByteArray/etc, without the need of a specific encodeToNamedNbtTag method.
  2. to have the serialization logic be handled on encodeToNamedNbtTag? In this case, the input of the method could be (name: String, payload: T), without the need for the NbtNamed wrapper.
  3. something entirely different from what I mentioned above?

It sounds kind of ambiguous to me at the moment.

KP2048 commented 4 months ago

Nice! I look forward to bumping my Gradle dependency

pandier commented 4 months ago

Nice job! Exactly what I wanted and need

BenWoodworth commented 4 months ago

@WinX64 (comment):

I do have a question regarding the new approach though. Maybe we can even use this discussion to iron things out. [...]

Good question! My current idea is mostly #1, with NbtNamed being a way to change the name of the payload when encoding.

Here's what I have in mind for encodeToNamedNbtTag:

fun <T> encodeToNamedNbtTag(serializer: KSerializer<T>, value: T): NbtNamed<NbtTag>

In this design, every serializable type is treated as having a NBT name (which root-named NBT variants use for the root value). This function is exactly the same as encodeToNbtTag, except it also captures value's NBT name. So this function gives a more complete in-memory representation of how value is serialized, instead of missing the name.

And in general, I think of NbtNamed(name, value) as effectively replacing its value's NBT name. Unless a serializable type is (statically) annotated with @NbtName, its name will be an empty string by default. And NbtNamed offers a way to (dynamically) change a value's NBT name. And with the NbtNamed<NbtTag> return type, it replaces NbtTag's empty name with value's.

Let me know if that clears things up a little!