Offroaders123 / NBTify

A library to read and write NBT files on the web!
http://npm.im/nbtify
MIT License
42 stars 5 forks source link

Binary Empty ListTag Type Persistence #40

Closed Offroaders123 closed 10 months ago

Offroaders123 commented 12 months ago

While looking into Schematic, Litematica, and Structure Block files, I wanted to see how NBTify's current support for them ranges. As of diffing the outputs from NBTify compared to the original source data, the tests didn't pass for the Schematic demo, and it ended up being because some of the ListTag values were types with a specific item type, which is something NBTify doesn't carry over during serialization if the array is empty. So if the ListTag in the file is typed for StringTag values, and the ListTag for them is empty, NBTify just uses EndTag as the item type, meaning the output diffs are inaccurate. This is not incorrect, and it's still properly accessible in terms of the spec and what the game can load, but it's not as accurate as being a perfect 1:1 deserialization-reserialization process, which is something I heavily strive for here with NBTify. If the exact same file isn't output, then NBTify is missing something, as I want it to be completely compatible with anything thrown at it.

This hasn't been a problem until now, because all of the other NBT files I have seen and tested against, all use EndTag for the empty array types. So I mostly haven't uncovered this just because I haven't come across a file that's implemented this way. Now that I have an example of one, I think it's easier to figure out how I should implement this.

I think this can actually work nicely with a custom Symbol() property on the array, which allows this persistence to be optional, and maybe even handled automatically by NBTify.

The other option I only recently found to be plausible, is that using Proxy objects to define custom NBTify primitive wrapper objects to limit/validate what kinds of properties can be assigned to a given wrapper primitive implementation. I didn't actually know about how Proxy objects worked, were implemented, or used, so this is a big hope to look into, and it gives me more ideas of ways to validate property assignment to these primitives, before passing them off to the write/serialization process.

My take for the longest time, since moving to using primitives as closely as possible, is to leave the run-time validation up to the time it is used, rather than when it is built/assigned to. Since JS doesn't have type validation when assigning to plain objects, arrays, or things like that, I didn't think it would be worth constructing full-blown custom structure primitives (objects, arrays) just to add type validation support, since that can be handled safely down the line anyways. Now that Proxy objects are in the picture, it might actually be feasible to validate types of things as they are assigned!

These were some links I used to explore this, I don't know too much as to what the best practices are for using Proxies with classes, say if you wanted to use them for these primitive wrappers, or for Web Components, something like that. My idea for using them with Web Components would help with not needing to use get() and set() accessors for everything, and in addition an internal private #field to store that value behind those public validators.

https://www.javascripttutorial.net/es6/javascript-proxy/ https://javascript.info/proxy https://stackoverflow.com/questions/47779762/proxy-a-webcomponents-constructor-that-extends-htmlelement https://grrr.tech/posts/2023/typescript-proxy-objects/

Listening to Devin Townsend Infinity for the first time, let's see how it goes! I'm already really liking it. Wow, I'm already excited to listen to this again :)

Offroaders123 commented 12 months ago

Ok, almost got it working, but realized part of an issue I don't think I can fix without going heavier into things. The ListTag item types can't be persisted when saved as SNBT, I can't do anything about saving to and from that stage. My code for using the Symbol("nbt.list.type") optional property on arrays appears to possibly be working when only using the NBT in JS land, and saving back to the binary format, but the SNBT step doesn't account for this handling. This can likely be another test case to handle in #41, since this is specific to the handling of the binary format, and it can't be managed when also using SNBT. I think it's a fair trade-off to support it in JS-only land, and not deal with it if you convert the NBT to SNBT and back. Mojang's use of SNBT itself has no way of handling that kind of info, so I don't think my implementation of SNBT should have to harp over it either. It's already a minor detail for keeping track of item types, for empty arrays, so I don't think it's something that specifically needs to be handled too tightly. One should already be validating their use of NBT keys and value types if they want to ensure everything is ship-shape anyways, so I think using TypeScript to virtually manage that insurance, or your own class validator, I think it's realistic for that information to be handled by the serializer itself, like I'm currently doing.

Offroaders123 commented 10 months ago

In relation to Bedrock's Block Entity actor storage format, it has come up that the game may indeed validate the item type for empty List tag values, which as this issue covers, NBTify doesn't currently support.

Offroaders123 commented 10 months ago

This was an actor file @JaylyDev sent me to help debug why the read-write process wasn't being symmetrical.

input.bin is the raw file from a Bedrock world's LevelDB database, and output.bin is what that file looks like when re-written to disk again using NBTify.

Since this uses some custom file formatting that isn't just a single plain NBT tag, you have to read it in chunks. The reader function is from my Bedrock-LevelDB project, and JaylyDev inspired my writer function here to write the NBT back to a single actor file again.

input.bin.gz output.bin.gz

Reader / Writer Code ```ts import { readFile } from "node:fs/promises"; import { read, write } from "nbtify"; import type { RootTagLike, NBTData, ReadOptions } from "nbtify"; const format = { name: "", endian: "little", compression: null, bedrockLevel: null, strict: true } as const satisfies Required; async function readNBTList(data: Uint8Array): Promise[]> { const entries: NBTData[] = []; while (true){ if (data.byteLength === 0) break; try { const entry: NBTData = await read(data,format); entries.push(entry); break; } catch (error){ const message: string = (error as Error).message ?? `${error}`; const length: number = parseInt(message.slice(46)); const entry: NBTData = await read(data,{ ...format, strict: false }); entries.push(entry); data = data.subarray(length); } } return entries; } async function writeNBTList(entries: NBTData[]): Promise { const results: Uint8Array[] = await Promise.all(entries.map(entry => write(entry,format))); const data = new Uint8Array(results.reduce((previous,current) => previous + current.byteLength,0)); let byteOffset = 0; for (const result of results){ data.set(result,byteOffset); byteOffset += result.byteLength; } return data; } const old = new URL("./old.nbt",import.meta.url); // This is `input.bin` const input = await readFile(old); console.log(input.join(" ")); const nbtList = await readNBTList(input); console.log(nbtList); const output = Buffer.from(await writeNBTList(nbtList)); // This is `output.bin` console.log(output.join(" ")); const reinput = await readNBTList(output); console.log(reinput); ```

When loading the re-written actor data back to the world save, the Block Entity data appears to be parsed incorrectly, maybe because Bedrock validates the Items List tag type, even though the NBT spec says that it will accept empty List tags typed with an End tag as the value type.

Loading Invalid Block Entity Actor

This is a diff of the byte content between input.bin and output.bin. You can see that only the item types for each of the Chest's Items key are changed, everything is byte-symmetrical.

Block Entity Actor Diff

Block Entity Actor Content

Offroaders123 commented 10 months ago

I think this is just about fixed! Going to wait to close this a little bit, just to test it out for a little while beforehand. Everything passes my tests, so I think we're good on that front for now at least.

Offroaders123 commented 10 months ago

Here are a few screenshots to go along with the commits I linked above these two comments:

List Type Test Failed

Tag Type Symbol Addition

Array Symbol Property Example

This shows the difference between defining this new Symbol property as as regular key, compared to defining it as non-enumerable with Object.defineProperty().

List Type Test Passing