emmanueltouzery / prelude-ts

Functional programming, immutable collections and FP constructs for typescript and javascript
ISC License
377 stars 21 forks source link

HashMaps comparison always seems to return the same hash code #47

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi,

Given the following

describe("HashMap", () => {
  it("Gives unique hash codes", () => {
    const map1 = HashMap.of(["48", 1], ["49", 2], ["50", 3], ["51", 4]);
    const map2 = HashMap.of(["49", 2], ["50", 3], ["51", 4]);
    const map3 = HashMap.of(["49345678", 2], ["50", 3], ["51", 4]);

    console.log(map1.hashCode(), map2.hashCode(), map3.hashCode());

    expect(map2.hashCode() !== map1.hashCode()).toBeTruthy();
    expect(map3.hashCode() !== map1.hashCode()).toBeTruthy();
    expect(map3.hashCode() !== map2.hashCode()).toBeTruthy();
  });
});

All hash maps are giving me the exact same hash code:


  ● HashMap › Gives unique hash codes

    expect(received).toBeTruthy()

    Received: false

      31 |     console.log(map1.hashCode(), map2.hashCode(), map3.hashCode());
      32 |
    > 33 |     expect(map2.hashCode() !== map1.hashCode()).toBeTruthy();
         |                                                 ^
      34 |     expect(map3.hashCode() !== map1.hashCode()).toBeTruthy();
      35 |     expect(map3.hashCode() !== map2.hashCode()).toBeTruthy();
      36 |   });

      at Object.<anonymous> (src/store/domain/types/DataClass.test.ts:33:49)

  console.log src/store/domain/types/DataClass.test.ts:31
    1696 1696 1696

Test Suites: 1 failed, 1 total

I'd expect some conflicts to be possible given the [comment] above the hashcode function, but since all 3 are returning 1696 even after arbitrary changes, this means we basically can't compare hash maps.

This is on the latest version of PreludeTS.

I'd love to help to fix this but the logic behind generating the hashcode is a bit above my head right now ):

emmanueltouzery commented 3 years ago

Hmm.. i mean hash key collisions are a performance issue, the only thing that would be a correctness issue would be if two equal values had different hashes.

And then, if I touch various hashmaps, I can get various hash values:

> prelude.HashMap.of(["49345678", 2], ["50", 3], ["51", 3]).hashCode()
1695
> prelude.HashMap.of(["4934578", 2], ["50", 3], ["51", 4]).hashCode()
1696
> prelude.HashMap.of(["49348", 2], ["50", 3], ["510", 4]).hashCode()
1694
> prelude.HashMap.of(["49348", 2], ["350", 3], ["510", 4]).hashCode()
52504

For hashmaps, the hash code is calculated by summing the hashcodes of keys and values in the map:

    hashCode(): number {
        return this.hamt.fold(
            (acc: number, value: V, key: K & WithEquality) =>
                getHashCode(key) + getHashCode(value), 0);
    }

One thing here is that we must make sure that the order of elements is not relevant in the computation of the hash code for hashmaps. So for instance, when we compute the hash code for vectors, we do:

    hashCode(): number {
        let hash = 1;
        for (let i=0;i<this.length();i++) {
            hash = 31 * hash + getHashCode(L.nth(i, this._list));
        }
        return hash;
    }

So, for every new element we start on the hash for the whole vector so far, times 31. But that means that the order of elements in the the vector is relevant, so we can't do that directly in the HashMap hash code computation.

So then we get to the hashcode computation for numbers and strings (since you have string keys, integer values). For numbers, we use some 'magic' (from my point of view certainly) formula from immutablejs (src/Comparison.ts, line 111). And for strings, we use an algorithm from stackoverflow (src/Comparison.ts, line 69, https://stackoverflow.com/a/7616484/516188 ).

So now I'll just check whether there could be a way of combining the hash values of the hashmap elements in a less naive way than simple summing, that would keep the attribute that order doesn't matter. I'll peek quickly at what immutablejs and some java libraries do.

emmanueltouzery commented 3 years ago

So, I see that vavr does the same, it just sums elements of the hashmap: https://github.com/vavr-io/vavr/blob/1c901067b1a5f9d1f4a4277301e5ba20dce0f2c5/src/main/java/io/vavr/collection/Collections.java#L240


    // hashes the elements regardless of their order
    static int hashUnordered(Iterable<?> iterable) {
        return hash(iterable, Integer::sum);
    }

    private static int hash(Iterable<?> iterable, IntBinaryOperator accumulator) {
        if (iterable == null) {
            return 0;
        } else {
            int hashCode = 1;
            for (Object o : iterable) {
                hashCode = accumulator.applyAsInt(hashCode, Objects.hashCode(o));
            }
            return hashCode;
        }
    }

In addition the hashmap for elements is obtained by hashing the array of [key, value], which is slightly different than what prelude does, which is hashing key, then hashing value, and then summing those two hashes.

For that, we could modify the hashing function for hashmaps, to not just sum hash(key) and hash(value), but instead using prelude's fieldsHashCode to compute fieldsHashCode([key, value]).

This would be probably better, because it does:

export function fieldsHashCode(...fields: any[]): number {
    // https://stackoverflow.com/a/113600/516188
    // https://stackoverflow.com/a/18066516/516188
    let result = 1;
    for (const value of fields) {
        result = 37*result + getHashCode(value);
    }
    return result;
}

meaning at least we'd have 37*hash(key)+hash(value) instead of directly hash(key)+hash(value).

I believe this would help with your case. So I'd probably stop with the research, try that change and if it helps with your use case and doesn't break any prelude-ts tests, then merge that to master.

emmanueltouzery commented 3 years ago

aaaaaaah there was a bug in the hashCode() for hashMap...

    hashCode(): number {
        return this.hamt.fold(
            (acc: number, value: V, key: K & WithEquality) =>
-                getHashCode(key) + getHashCode(value), 0);
+                acc + getHashCode(key) + getHashCode(value), 0);
    }

it was returning the hashcode of the last element traversed... I'll do also the fieldsHashCode change, but that was the main issue. Anyway, for sure I'll fix and re-release. But this should have only performance consequences, not correctness.

emmanueltouzery commented 3 years ago

Hi, I pushed prelude-ts 1.0.1 to npm with the fix. Right now it's not visible on the npm website yet, but it's probably just a matter of a cache refresh, the release should be there.

ghost commented 3 years ago

Ahh, that is such a silly mistake, I looked at this code trying to figure out what was going wrong and also completely missed it. I love it!

Thanks for the swift response/fix!!