Offroaders123 / NBTify

A library to read and write NBT files on the web!
http://npm.im/nbtify
MIT License
42 stars 5 forks source link

Number objects vs. numbers #50

Open SuperLlama88888 opened 2 months ago

SuperLlama88888 commented 2 months ago

Platform: Web JS, imported from https://cdn.jsdelivr.net/npm/nbtify@2.0.0/+esm

When reading NBT data, integers and bytes are expressed as Number() objects instead of numbers. These can only be created with the new Number() constructor and have the following drawbacks:

Is this decision intentional? I know this is true for integers and bytes, but I haven't tested other number types.

Thank you for your time and such a useful library!

Offroaders123 commented 2 months ago

General answer

At least for the meantime, this is intentional yes.

Since JavaScript only has a single number primitive type, NBTify needed a way to distinguish what method a given numerical value is intended to be managed in terms of the NBT typings.

Since JavaScript's number type is equivalent to that of a double type in Java, programs would have to assume the type of what a given NBT key-value is, depending on the current state of that value in the file, which seems unsafe/lossy to me.

Deeper dive

Say we look at Java Edition's world save format for example, the level.dat file.

level.dat Key Examples

Number type persistence

If each of these values were to be read into JavaScript-land as plain number types, this range information would be lost, and would either have to be handled in the writing stage by the user manually (which makes the API harder to use since they would have to do this transformation manually for every file that has non-double number types), or things would be re-written again to the file, with completely different number scales.

The base issue is that JavaScript doesn't provide enough primitive number types that will losslessly persist this intended type information down to the NBT object tree, where everything is represented in JavaScript-land. Ideally, JavaScript could have something like byte, short, int, and float types to fix this for us, but right now that isn't something planned, or available to us. Thankfully, we do have the bigint type though, which is plenty enough to hold values for the long type that we need to represent.

I also don't really like the usage of needing to interact with these Number objects as objects themselves, but I would rather have to deal with that on it's own, than lose precision, when reading and overwriting an NBT file back again:

import { read, write, type NBTData } from "nbtify";
import { Buffer } from "node:buffer";

// Your NBT file from the file system, network, etc.
declare const input: Uint8Array;

// NBT object tree
const nbt: NBTData = await read(input);

// Re-compiled NBT file
const output: Uint8Array = await write(nbt);

// Should return with `0`, as the bytes for these files should ideally be identical.
const identical: -1 | 0 | 1 = Buffer.compare(input, output);

Alternate solution: Type inference by value range

One alternate take could be that inferences can be made by the current state of the value, and you could decide the primitives to use, on the fly when writing them back to the file. I really like the premise of this, and I se where it is coming from. However, this has an edgecase that makes things unsafe when a value in one of the larger value-sized types currently holds a small value, which makes the inference inaccurate depending on the current value.

For the level.dat example above, the Difficulty value (byte) could hold a value of 0 if the game is in Peaceful. With the typings inference model, inferring the type of 0 means it would safely resolve to the smallest type, being byte this time around. For this case, it happens to be the correct inference.

However for the DataVersion value (int), it may also initially hold a value of 0. So when inferring the type from that, it would also be inferred as a byte type. This changes how the value will be serialized, and the resulting file will no longer be synonymous with the input file, because the current state of DataVersion is now stored using less bytes than it was to begin with. The value is persisted, but how it is stored has changed.

This comes back to the part about possible user-interaction. If the types of the NBT file can be pre-documented to a schema of sorts, maybe this could be handled by that, and corrected when writing the file. However, this would mean that every NBT file would potentially need type documentation to write it back symmetrically. This seems too far-fetched to me. Anyone can write an NBT file, so not all files can possibly have documentation.

Objects are hard to use

Yeah I agree, I don't like this part of it either. I wish we wouldn't have to unwrap these primitives with things like myValue.valueOf() to get the plain number out of it again. If there is some other way of making this work, while ensuring the types can be preserved, I would love to know!

If anything, I have been wanting to see if there's a way to still use ES6 classes, and also allow for the syntax like Int8(), Int32(), compared to requiring the use of new Int8(), and new Int32().

import { Int8 } from "nbtify";

// Current
const byte0 = new Int8(25);

// Potential option, easier!
// More like `Number()` and `BigInt()` too!
const byte1 = Int8(80);

Sorry this was so long, I haven't really documented this concept outright in the repo anywhere yet, so I thought I would cover all of the bases while it was relevant in the discussion.

Thanks for mentioning this!

SuperLlama88888 commented 2 months ago

Thank you for such a detailed response! I think the current system deals with JavaScript's shortcomings in the best way possible across all uses.

If each of these values were to be read into JavaScript-land as plain number types, this range information would be lost

I didn't quite understand this bit - is precision stored in these Number objects? When I use it in the browser there isn't anything else in the Number object but the number itself - like this: image

Offroaders123 commented 2 months ago

The usage of the Number object itself isn't any different, no. It's how NBTify detects what kind of number it is, once it encounters it though, as each number class has it's own prototype inheritance (it is it's own extension of Number).

NBTify Primitive Type Detection

So if the number object value you pass in is an instance of NBTify's Int32 class for example, then we know that it should be serialized with NBT's TAG_Int type.

I did just notice though, now I think I see the confusion around the number types when you're using NBTify. From the CDN build of the package (or maybe it's just Firefox's developer tools), the names of the classes' names appear to be removed, either with the minification from the CDN build, or because of the dev tools.

SuperLlama88888 commented 2 months ago

Thank you for this! In Chrome, it works slightly better: image The a, i and o are the class names after being mangled by the minification process. It looks like it's something on jsDelivr's end then. It makes much more sense with different classes extending Number there, just a shame that this is lost in the debugging process.

Again, thank you so much for your dedication towards helping me with this, you've been really helpful. Cheers!

Offroaders123 commented 2 months ago

I think I realized a possible way to make this work a bit nicer, at least from the TypeScript side of things. I can possibly expose the typings of these primitives simply as a branded type, rather than an explicit Number, so it can instead just look more similar to that of a number. This could be deceiving though, so maybe it wouldn't be completely straightforward to users.

Once I get a working demo of this setup I can link it here for an example.

Offroaders123 commented 1 month ago

Ok, I've merged a working version of this into the main branch, going to work with it for a bit to see if it has any issues with existing type usages. It should work nicely though! We'll see.