Compound, List, and Array Tags give false-positives when using ==

ForeverZer0 / SharpNBT

A pure CLS-compliant C# implementation of the Named Binary Tag (NBT) format specification commonly used with Minecraft applications, allowing easy reading/writing streams and serialization to other formats.

MIT License

25 stars 9 forks source link

Compound, List, and Array Tags give false-positives when using == #31

Open Fisch37 opened 6 months ago

Fisch37 commented 6 months ago

When comparing compound tags the bar for equality is set extremely low. Two compound objects are equal if they have the same type (guaranteed) and their names are the same.

This is because the CompoundTag class does not override the Tag.Equals method leading to this bizarre behaviour. A very simple check is this:

using SharpNBT.SNBT;
Console.WriteLine(StringNbt.Parse("{value:385}") == StringNbt.Parse("{data:[1,2,3,4]}"));

This will pass even though the two compounds aren't remotely the same.

ForeverZer0 commented 6 months ago

Currently compound tags only compare the name of the tag to determine equality, and do not check contents. The given example results in two tags where neither has a name, therefore are considered "equal" using that very generic comparison. While it would be somewhat trivial to implement, comparing dictionaries (i.e. compound tags) for equality is not going to be efficient (deep nesting, binary data, etc), and rarely ever useful or meaningful beyond these simple examples.

While I am not outright opposed to implementing it, I am leaning strongly in favor of not doing so, as Linq and/or existing extension methods (Union, Intersect, etc) available to the IDictionary interface already exist to solve this problem, if a deep comparison is actually needed. Otherwise this can result in a foot-gun where someone assumes that calling == on two tags (or even abstract Tag instances) is going to be a relatively cheap operation, not realizing that this could severely degrade performance, as real-world use of compound tags often contain tens of even hundreds of MB of data. This is likely why the .NET runtime's dictionary implementation likewise does not include this interface, which the compound tag replicates for the sake of a familiar API surface.

I am open-minded to arguments to the contrary, but it should be noted that comparing table/dictionary types for equality is a "code smell" in pretty much any language, including C#. Typically you would want to be checking for known/expected keys, and handling them as-needed to validate the data.

I will consider a compromised "middle-ground" solution, where compound tags compare equality of their names, number of items, and perhaps check if all child key names are the same, but a full recursive comparison of child-types and their equality is going to require some powerful persuasion.

Fisch37 commented 6 months ago

I think that reasoning is sound, but if not implementing a full equality check, I would favour not having one at all. == having an "eager-equality" approach is unintuitive and can lead to errors more easily than it is helpful.

A prime example of that is how I discovered this issue: I was comparing a list of compounds against a mask (a small mask) and suddenly found that it just matched the first one it finds. I dug into the code to figure out why and thus this issue was born. An innocent programmer will assume that a .Equals method is going to be sensibly checking for equality and rather fail early simply because that is a precedent (consider that default == checks identity, which is just an extreme fail-fast implementation).

I needed equality checking, but it wasn't too hard to implement myself. Maybe having an additional method like .PreciselyEquals that does that more expensive check and overriding .Equals to identity matching would be a good approach?

As it goes for code I think the best implementation would be removing the IEquality interface from Tag and creating another abstract EqualityTag class to reintroduce the old mechanic. But that does mean a rather big code restructure so I'd be up to write a Draft Pull Request for that myself, if you want.

Fisch37 commented 6 months ago

(Note that even having a .PreciselyEquals method is not what I'm trying to argue towards. I think having it would be nice, but it isn't that hard to do yourself. It's more a case of "fixing" an unintuitive == implementation)

ForeverZer0 commented 6 months ago

It's more a case of "fixing" an unintuitive == implementation

I do agree with this. The default equality for compound tags should only be true when the objects are the same reference.

Fisch37 commented 6 months ago

Adjusted the title to better fit the current topic.

As is apparent from the title this issue affects ListTag and ArrayTag classes as well since those classes don't have a .Equals override either

ForeverZer0 commented 6 months ago

As it goes for code I think the best implementation would be removing the IEquality interface from Tag and creating another abstract EqualityTag class to reintroduce the old mechanic

I intended to address this part of your comment in my previous reply, but yes, I agree this sensible and more in line with language conventions. The only part of I dislike about this approach is that it loses the ability to check for equality of tags without first casting or pattern-matching a Tag into its concrete type. Realistically I am uncertain how often this is actually needed/used, but it could potentially be a breaking change to the existing API. For now I will most likely opt for fixing the behavior, obsoleting equality of list/array/compound types to set off compiler warnings, but postpone the actual removal of the interface for a future major version bump.

Fisch37 commented 6 months ago

That makes perfect sense, I like that

Fisch37 commented 5 months ago

I came back to this because my inbox failed to mark this as read. I noticed you said

it [implementing an EqualityTag ABC] loses the ability to check for equality of tags without first casting or pattern-matching a Tag into its concrete type

I think the only issue here is that you would need to cast to EqualityTag instead of Tag, no?

ForeverZer0 commented 5 months ago

I think the only issue here is that you would need to cast to EqualityTag instead of Tag, no?

@Fisch37 Assuming each tag shared a common interface that implemented IEquality, then technically yes. Either way it would still require casting a Tag instance to a more derived type in order to perform the operation. This is obviously not a difficult task, and pretty much required to do anything useful with a Tag anyways, but I feel is a bit less ergonomic if simply scanning through a collection in search of a particular value.

It also may bring up some new issues with what is considered "equal" and the rules for type coercion. For example, assume we have three EqualityTag instances with the same name. Concretely they are an IntTag, a LongTag and a FloatTag, with the values 1, 1L, and 1.0f respectfully. Should these be considered equal? I personally would say "no", and enforce they should be of the same tag type, but it could be argued that they are depending on another's perspective and/or expectations. Comparing an int, long, and float is a valid comparison in C# (with no explicit type conversion), and the following is valid:

long a = 1L;
int b = 1;
float c = 1.0f;
bool IsTrue = (a == b && b == c);

Realistically, finding a tag with a specific value is not commonly needed, and the typical use-case is finding a tag with a certain name then doing something with its value, so this "sacrifice" will be of little consequence.