denoland / std

The Deno Standard Library
https://jsr.io/@std
MIT License
3k stars 602 forks source link

`assertEquals` is very slow for large `Set`/`Map` obj due to O(n^2) comparison algorithm #5903

Open lionel-rowe opened 1 week ago

lionel-rowe commented 1 week ago

Describe the bug

assertEquals is very slow when comparing large Set and Map objects due to using an O(n^2) comparison algorithm:

https://github.com/denoland/std/blob/7e2616031a2dcd9c405086192e53e39233370952/assert/equal.ts#L87-L99

It looks like there's the same issue in the expect version too.

Steps to Reproduce

import { assertEquals } from "jsr:@std/assert";

const arr1 = Array.from({ length: 1e4 }, (_, i) => i)
const arr2 = [...arr1]
const set1 = new Set(arr1)
const set2 = new Set(arr2)

console.time('compare arr')
assertEquals(arr1, arr2);
console.timeEnd('compare arr')

console.time('compare set')
assertEquals(set1, set2);
console.timeEnd('compare set')

Results:

compare arr: 14ms
compare set: 1178ms

Results get exponentially worse as the size increases.

Expected behavior

assertEquals of Set and array (or Map and obj) to take approximately equal time.

Environment

iuioiua commented 1 week ago

PRs that improve the performance of assertEquals() for this scenario are welcome.

lionel-rowe commented 1 week ago

Two possible directions to explore:

const aEntries = [...a.entries()].sort();
const bEntries = [...b.entries()].sort();

for (const [idx, [aKey, aValue]] of aEntries.entries()) {
  const [bKey, bValue] = bEntries[idx]!;
  if (!compare(aKey, bKey) || !compare(aValue, bValue)) {
    return false;
  }
}

return true;

This passes for all current tests but I think it'll fail for more complex examples as sort() just stringifies its args and doesn't modify order if the strings are equal (so [{ b: 2 }, { a: 1 }].sort() is [{ b: 2 }, { a: 1 }] because `"[object Object]" === "[object Object]").

const keySet = new Set<unknown>();
const valSet = new Set<unknown>();

for (const input of [a, b]) {
  for (const [key, value] of input.entries()) {
    keySet.add(key);
    valSet.add(value);
  }
}

return keySet.size === a.size && valSet.size === a.size;

This fails some of the current tests as sets are only unique by reference, not value.

Basically, the problem in both cases is not having a unique-by-value, reference-equal, serialized form.

Leokuma commented 1 week ago

I made this comparison with Lodash isEqual(). The difference is not as big I expected. Lodash code looks hacky.

import "https://raw.githubusercontent.com/lodash/lodash/4.17.10-npm/lodash.js";
import { assertEquals } from "jsr:@std/assert";

const arr1 = Array.from({ length: 10_000 }, (_, i) => i)
const arr2 = [...arr1]
const set1 = new Set(arr1)
const set2 = new Set(arr2)

let result: boolean

console.time('compare set std')
assertEquals(set1, set2);
console.timeEnd('compare set std')

console.time('compare set lodash')
result = _.isEqual(set1, set2)
console.timeEnd('compare set lodash')
console.log(result)
❯ deno run -A set.ts
compare set std: 616ms
compare set lodash: 317ms
true
lionel-rowe commented 1 week ago

A partial fix would be to have a fast path for cases where all keys of both are primitives, which would certainly have worked for the use case where I discovered this limitation, and maybe for most use cases where you'd be likely to want to compare large Sets/Maps using assertEquals? Typically where the keys aren't primitive, you don't want to compare keys by value (because equality is determined by reference for purposes of setting/getting/etc).

Leokuma commented 1 week ago

Talking only about Sets, if all the values in either Set of the comparison has only primitives (you don't even need to check both Sets, just one of them), then you can compare them instantly with:

set1.symmetricDifference(set2).size == 0

I think this optimization is worth it because a Set usually has only primitives or only objects. I believe it's not common that a Set is populated with both primitives and objects. But I could be wrong.

lionel-rowe commented 1 week ago

Talking only about Sets, if all the values in either Set of the comparison has only primitives (you don't even need to check both Sets, just one of them), then you can compare them instantly with:

set1.symmetricDifference(set2).size == 0

I think this optimization is worth it because a Set usually has only primitives or only objects. I believe it's not common that a Set is populated with both primitives and objects. But I could be wrong.

Yeah I misspoke, only one set/map needs checking for exclusively primitive keys (because if the other has some non-primitive keys, at least one of the primitive keys must be missing due to having already passed same-size check). Probably worth adding symmetricDifference as an extra-fast-path though.