Closed ghost closed 3 years ago
Hmm.. i mean hash key collisions are a performance issue, the only thing that would be a correctness issue would be if two equal values had different hashes.
And then, if I touch various hashmaps, I can get various hash values:
> prelude.HashMap.of(["49345678", 2], ["50", 3], ["51", 3]).hashCode()
1695
> prelude.HashMap.of(["4934578", 2], ["50", 3], ["51", 4]).hashCode()
1696
> prelude.HashMap.of(["49348", 2], ["50", 3], ["510", 4]).hashCode()
1694
> prelude.HashMap.of(["49348", 2], ["350", 3], ["510", 4]).hashCode()
52504
For hashmaps, the hash code is calculated by summing the hashcodes of keys and values in the map:
hashCode(): number {
return this.hamt.fold(
(acc: number, value: V, key: K & WithEquality) =>
getHashCode(key) + getHashCode(value), 0);
}
One thing here is that we must make sure that the order of elements is not relevant in the computation of the hash code for hashmaps. So for instance, when we compute the hash code for vectors, we do:
hashCode(): number {
let hash = 1;
for (let i=0;i<this.length();i++) {
hash = 31 * hash + getHashCode(L.nth(i, this._list));
}
return hash;
}
So, for every new element we start on the hash for the whole vector so far, times 31. But that means that the order of elements in the the vector is relevant, so we can't do that directly in the HashMap hash code computation.
So then we get to the hashcode computation for numbers and strings (since you have string keys, integer values). For numbers, we use some 'magic' (from my point of view certainly) formula from immutablejs (src/Comparison.ts, line 111). And for strings, we use an algorithm from stackoverflow (src/Comparison.ts, line 69, https://stackoverflow.com/a/7616484/516188 ).
So now I'll just check whether there could be a way of combining the hash values of the hashmap elements in a less naive way than simple summing, that would keep the attribute that order doesn't matter. I'll peek quickly at what immutablejs and some java libraries do.
So, I see that vavr does the same, it just sums elements of the hashmap: https://github.com/vavr-io/vavr/blob/1c901067b1a5f9d1f4a4277301e5ba20dce0f2c5/src/main/java/io/vavr/collection/Collections.java#L240
// hashes the elements regardless of their order
static int hashUnordered(Iterable<?> iterable) {
return hash(iterable, Integer::sum);
}
private static int hash(Iterable<?> iterable, IntBinaryOperator accumulator) {
if (iterable == null) {
return 0;
} else {
int hashCode = 1;
for (Object o : iterable) {
hashCode = accumulator.applyAsInt(hashCode, Objects.hashCode(o));
}
return hashCode;
}
}
In addition the hashmap for elements is obtained by hashing the array of [key, value]
, which is slightly different than what prelude does, which is hashing key, then hashing value, and then summing those two hashes.
For that, we could modify the hashing function for hashmaps, to not just sum hash(key) and hash(value), but instead using prelude's fieldsHashCode
to compute fieldsHashCode([key, value])
.
This would be probably better, because it does:
export function fieldsHashCode(...fields: any[]): number {
// https://stackoverflow.com/a/113600/516188
// https://stackoverflow.com/a/18066516/516188
let result = 1;
for (const value of fields) {
result = 37*result + getHashCode(value);
}
return result;
}
meaning at least we'd have 37*hash(key)+hash(value) instead of directly hash(key)+hash(value).
I believe this would help with your case. So I'd probably stop with the research, try that change and if it helps with your use case and doesn't break any prelude-ts tests, then merge that to master.
aaaaaaah there was a bug in the hashCode() for hashMap...
hashCode(): number {
return this.hamt.fold(
(acc: number, value: V, key: K & WithEquality) =>
- getHashCode(key) + getHashCode(value), 0);
+ acc + getHashCode(key) + getHashCode(value), 0);
}
it was returning the hashcode of the last element traversed... I'll do also the fieldsHashCode
change, but that was the main issue. Anyway, for sure I'll fix and re-release. But this should have only performance consequences, not correctness.
Hi, I pushed prelude-ts 1.0.1 to npm with the fix. Right now it's not visible on the npm website yet, but it's probably just a matter of a cache refresh, the release should be there.
Ahh, that is such a silly mistake, I looked at this code trying to figure out what was going wrong and also completely missed it. I love it!
Thanks for the swift response/fix!!
Hi,
Given the following
All hash maps are giving me the exact same hash code:
I'd expect some conflicts to be possible given the [comment] above the hashcode function, but since all 3 are returning
1696
even after arbitrary changes, this means we basically can't compare hash maps.This is on the latest version of PreludeTS.
I'd love to help to fix this but the logic behind generating the hashcode is a bit above my head right now ):