godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.07k stars 69 forks source link

Allow `Dictionary.hash()` to not care about order (or provide an alternative hashing method) #9452

Open monitz87 opened 3 months ago

monitz87 commented 3 months ago

Describe the project you are working on

Multiplayer action roguelite with rollback netcode (more like a hybrid between rollback and server auth/client side prediction with delta compression but I digress).

Describe the problem or limitation you are having in your project

An important part of rollback netcode is being able to detect state mismatches between the server and clients in order to know when the client needs to roll back (or whether the client's state is irreparably broken and they need a full resync). The best way to do this is by hashing the game state Dictionary on the server, sending the hash over to the client, and then comparing it to a hash of the client's state Dictionary.

The issue is that the native .hash() method in Dictionary cares about order, so two dictionaries with the exact same keys and values will produce different hashes if the keys were inserted in different orders.

Rollbacks on the client side cause certain state values to be rewritten, which makes their order on the Dictionary change, which makes hashes not match for Dictionaries that (sans order) are identical. This makes it necessary to sort the Dictionaries before hashing, which is VERY expensive in terms of performance when the game state starts to get big enough, and also unnecessary when all you care about are the values in the Dictionary and not their order.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

I understand that right now it's a feature of Dictionaries to preserve insertion order, so I guess it kinda makes sense for hashes to be different for differently ordered Dictionaries. At the same time (and I might be biased because of my needs) it feels intuitive that you'd want two Dictionaries with the same values to be easily comparable and produce the same hash, regardless of order. I can't think of many cases where people would actually want those Dictionaries to be considered different (but again, I might be biased).

In lieu of replacing the current Dictionary::hash() logic (which might be a bit too extreme), I propose adding an optional boolean parameter to make the hash not care about order. If that's still too much, maybe just adding a different hashing method Dictionary::commutative_hash() that doesn't care about order.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Replacing the original implementation is just a matter of storing each key/value pair hash separately and adding them together instead of using each pair's hash as the seed for the next one

Adding the boolean parameter or a separate method would be more tricky, since the recursive nature of the hashing function (and the fact that it calls the hashing function of Variant) would require touching more code (maybe in not very pretty ways)

If this enhancement will not be used often, can it be worked around with a few lines of script?

Not really, since the main issue here is performance, hashing via script defeats the purpose

Is there a reason why this should be core and not an add-on in the asset library?

Again, performance is key here

AThousandShips commented 3 months ago

Replacing the original implementation is just a matter of storing each key/value pair hash separately and adding them together instead of using each pair's hash as the seed for the next one

Do you have any examples of hash algorithms that support this? This doesn't match any of my experiences with hashing, what algorithm do you have in mind for this?

At minimum this sounds like an algorithm extremely prone to collisions

monitz87 commented 3 months ago

I admit hashing is not my area of expertise. Can you explain why this algorithm would be more collision prone? In any case, even if my proposed implementation isn't sound, that doesn't mean that there isn't one out there that is. I'd be surprised if a robust hashing algorithm for unsorted hash maps doesn't exist.

AThousandShips commented 3 months ago

This is really something that the proposal should contain, it's part of the requirements

I'm no expert either, but normally hashing algorithms are very chaotic exactly because they have each step depend on the previous one, hence one that removes order from the equation risks such effects, but that was just a guess, since there's no details on what algorithm to use I can't speak any further on it

monitz87 commented 3 months ago

Fair enough. I'll do some research then and refine the proposal.

dsnopek commented 3 months ago

I think the usual approach would be to sort the dictionaries before hashing, or use a sorted dictionary type. Unfortunately, I don't think we have a good way to sort dictionaries, and we don't have a sorted dictionary type. :-/

AThousandShips commented 3 months ago

See also:

tiloc commented 3 months ago

For comparison: the Java HashMap which is probably similar in purpose to the Godot Dictionary simply numerically adds the hashes of all individual entries (an entry being the combination of key and value). This makes order irrelevant. It treats its keys as a set, i.e. they cannot be duplicate, and no order may be assumed. Developers who want guarantees about the order need to use a SortedMap instead. Sorting typically requires the developer to provide a comparison function between the entries.

monitz87 commented 3 months ago

I noticed the PR for sort before opening this issue, and native sorting is definitely a step up, but the O(n) to O(n^2) complexity really adds up anyway.

Looking at the discussion around the sort PR, I do realize something:

Dictionaries now exist in a strange halfway state where order matters, but developers don't have enough control over said order. As I said, sort is a step up, but without being able to control insertion (such as allowing efficient insertion at a specific "index", or before/after another key, as suggested in the PR), performance really suffers when dealing with larger dictionaries. Even with control over insertion, I think there is enough space for two similar but separate structure types (like the Java example @tiloc mentioned).

AThousandShips commented 3 months ago

Adding a separate data type isn't going to happen I'd say, adding new Variant types expensive and complex

AThousandShips commented 3 months ago

Note also that this would only work for dictionaries with simple indices, so no Object or similar keys, and generally sorting of Variant isn't stable, for performance reasons

monitz87 commented 3 months ago

Note also that this would only work for dictionaries with simple indices, so no Object or similar keys, and generally sorting of Variant isn't stable, for performance reasons

What do you mean by "this" in this context?

AThousandShips commented 3 months ago

Having a sorted dictionary

But also hashing of those types isn't stable either, so if your dictionary has Object keys or values it won't match, for example if it has a Resource as a value

monitz87 commented 3 months ago

I mean, right now we kind of have a "sorted" dictionary. I'm vouching for a completely unsorted one (or at least one where order doesn't have bearing on the hash)

AThousandShips commented 3 months ago

Thats different, you can still iterate a dictionary, and it's not sorted, it's insertion order based

You're talking about hashing, a sorted array, or unsorted one, would be so beyond the hash, like when doing:

for key in dict:

So we need to be clear of what we're talking about, because sorted ≠ order independent hash

monitz87 commented 3 months ago

You're right. My bad, I misused the word.

I think at least the way I see it, it's a matter of expected use. This of course varies from developer to developer, so I guess the only way to really settle a matter like this would be some sort of poll.

For me personally (and I dare say I'm not too off the mark in terms of how others see this), it makes sense that dictionaries preserve insertion order, but it is expected that two dictionaries with the same data that were constructed in a different order would be considered semantically identical, and thus would yield the same hash.

I'd argue that insertion order preservation is more related to behavior (since most of the time it has bearing on performing logic while iterating on a dictionary, which can have vastly different results depending on order), while hashing is more related to data. While there can be cases where a developer will want to compare hashes to know if two dictionaries will behave identically, I think most use cases for hash only care about the data.

Of course, that's just one opinion, and I don't expect that to hold much weight when determining whether to make changes in an engine that is used by thousands. Also, changing the current hash implementation to ignore order would definitely be a breaking change, which is why I suggested adding an optional parameter or a separate hashing method.

timothyqiu commented 3 months ago

FYI: Python's dictionary keeps insertion order too, and it simply makes dictionaries unhashable (hash({}) throws). Users have to define their own way of how to hash a dictionary.