PistonDevelopers / hematite_nbt

A full-featured Rust crate for working with Minecraft's Named Binary Tag (NBT) file format, including Serde support.
MIT License
99 stars 37 forks source link

Serde byte/int/long array serialization (Issue #27 / #32) #51

Closed Schuwi closed 4 years ago

Schuwi commented 4 years ago

This PR aims to implement serde serialization support to serialize i8, i32 and i64 sequence types to NBT ByteArray, IntegerArray and LongArray respectively. This implementation checks the type of the first sequence element to determine which NBT List/Array type to use. This means of course that serializing to a NBT List of NBT Byte/Integer/Long types is not possible anymore, but as discussed in #27 this doesn't seem to be used in practice. Also if required this could probably be implemented as some kind of option later.

I ran some rudimentary tests by serializing a Minecraft chunk with https://github.com/feather-rs/feather including Lists and LongArrays and checking the result in an NBT viewer. Everything serialized as expected.

This would resolve #27 and #32.

Schuwi commented 4 years ago

Oh, yeah kinda forgot to run cargo test, heh...

atheriel commented 4 years ago

I’m very happy to see work on this. It looks like the failures are due to handling of bested lists; do you think it’s possible to handle that with your current approach? Or does the serializer not have enough information at that point?

Schuwi commented 4 years ago

Okay, the tests (including array serialization tests) should now complete successfully, though it just dawned on me that this whole approach is flawed. Empty arrays can't be identified on serialization because no elements are written to identify the element type with. So every empty sequence is serialized as an empty list right now...

Schuwi commented 4 years ago

Alright. Here is how it looks:

In conclusion:

The current implementation of serde isn't designed to allow for explicitely typed sequences and this is unlikely to change in the future (see the wontfix tag on https://github.com/serde-rs/serde/issues/607).

My proposal:

Using a dirty (not-so-)little hack.

As NBT doesn't support tuples, our Serializer::serialize_tuple_struct functions are currently "unused" (i.e. they just throw an error upon invocation). This method has a very interesting signature: fn serialize_tuple_struct(self, name: &'static str, len: usize) -> Result<Self::SerializeTupleStruct, Self::Error> which is basically the same signature as Serializer::serialize_seq plus an extra &'static str (and usize instead of Option<usize> but our sequences are always sized so that distinction doesn't matter).

Now my idea is to (ab)use this method by implementing three serialize_with functions (i8_array, i32_array and i64_array) which are defined generically for their corresponding IntoIterator<ixx> doing something like:

let seq = serializer.serialize_tuple_struct("i8", iter.size_hint().1.unwrap());
for elem in iter {
    seq.serialize_field(&elem);
}
seq.end();

Thus the serialize_tuple_struct method can check the value of the name parameter and serialize an array according to the type.

Downsides to this approach:

Surprisingly few actually. The serialize_tuple_struct implementation can't be called "accidentally" (via the #[derive(Serialize)] macro) because i8, i32 and i64 are reserved keywords and thus can't be used as struct names. The only downside I can think of right now:

I'll try to implement this and see how it works out. It should actually be less work than the current approach but optimistic assessments have been shown to be incorrect in the past.

atheriel commented 4 years ago

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

Schuwi commented 4 years ago

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

Looking at some 1.13.2 vanilla chunk data right now I can confirm I have found an empty long array (in this case Chunk:Level/Structures/References/Monument). Though I have no idea whether the vanilla client would accept an empty List here instead.

Edit: Manually changing this exact empty array to an empty List and loading the chunk in the vanilla 1.13.2 client resulted in no unusual console output and inspecting the chunk data afterwards the value in question was back to an empty long array.

caelunshun commented 4 years ago

I’m not sure how empty lists are serialized by Minecraft, it may not matter what tag is set in those cases. Do you know it generated data that could confirm this?

In most cases using an empty list should be fine, since from my tests vanilla will ignore a tag if it has the wrong type. In @Schuwi's Monument case above, the client expected a long array but found a list, so it skipped properly deserializing that field without throwing any errors. (This can be observed by, for example, changing the type of the Palette tag for chunk data—vanilla will just skip deserializing that chunk section.)

It's possible that not all vanilla deserializers will ignore errors in this manner, so I would advise caution using the empty TAG_List approach. Though it will work in all cases I've tried, it could cause bugs in the future.

Also, most of the other NBT libraries I've looked at won't be able to interpret an empty list as an empty TAG_Long_Array or TAG_Int_Array. So going with this approach might break compatibility with some non-vanilla implementations.

The serialize_tuple_struct implementation can't be called "accidentally" (via the #[derive(Serialize)] macro) because i8, i32 and i64 are reserved keywords and thus can't be used as struct names.

Note that the r# syntax can allow a keyword to be used as an identifier. For example, this compiles and will have the struct named i32:

struct r#i32 {}

I doubt anyone will name their types like this, but to be sure you might want to use a more obscure special value, such as __hematite_nbt_i32_array__.

Schuwi commented 4 years ago

Alright, I implemented my proposed serialize_with approach (https://github.com/Schuwi/hematite_nbt/tree/explicit-array-serializing, working implementation) but I bumped into a new problem with that approach. serialize_with can't be used on nested collections. That's not really our fault, but rather a limitation of serde which has an open issue (https://github.com/serde-rs/serde/issues/723) but it doesn't look like any solution will be implemented soon. Other than that I find this approach still quite promising, so I'd be happy if anyone can come up with a workaround for this limitation. I haven't really found one yet...

atheriel commented 4 years ago

The serialize_with workaround seems like a clever solution to me, it's unfortunate there is a roadblock. When I originally implemented this stuff, I did encounter many of these limitations of serde myself. I think this is ultimately because NBT does not map perfectly to the serde data model, so we're just stuck trying to find reasonable workarounds.

Schuwi commented 4 years ago

Alright, I fixed up the code and implemented the nested lists using a workaround. I would propose to merge it for now (other branch, will make a new PR shortly). I think this approach is the way to go and the rest (proper nested lists support) is up to serde to implement. Also, as it is entirely optional, nobody is forced to use it and it breaks no existing code.

Schuwi commented 4 years ago

Closed in favor of #52