Closed oflatt closed 2 weeks ago
Can we instead just use a build-time flag which users can use to opt in for determinism?
That's a good idea, but perhaps determinism should be opt-out?
Some thoughts:
IndexMap
may actually help performance since it's faster for iterating through! A quick run of cargo build --release && time target/release/egglog tests/eggcc-extraction.egg 2> /dev/null
vs cargo build --release -F nondeterministic && time target/release/egglog tests/eggcc-extraction.egg 2> /dev/null
shows that the deterministic version is slightly faster. If this result is consistent then we should just merge this PR without a separate compiler option!
TypeInfo::sorts
from HashMap
to IndexMap
, the time it takes to run eggcc-extraction.egg
goes down from 16.6s to 11.8s, which is huge!! This is because one of our bottleneck, as shown by cargo flamegraph
, is in eval_lit
, which linearly scans the TypeInfo::sorts
map. A better implementation should not do a linear scan.Reverted the feature, so this should be good to go! Excited to see better performance testing in the future.
@oflatt Update to main to get benchmarking!
Comparing oflatt-nondeterminism-fix
(1b550ab) with main
(60342af)
❌ 6
regressions
✅ 2
untouched benchmarks
:warning: Please fix the performance issues or acknowledge them on CodSpeed.
Benchmark | main |
oflatt-nondeterminism-fix |
Change | |
---|---|---|---|---|
❌ | eggcc-extraction |
4.1 s | 4.4 s | -7.45% |
❌ | herbie |
282.6 ms | 304.8 ms | -7.28% |
❌ | lambda |
145.1 ms | 159.2 ms | -8.89% |
❌ | math-microbenchmark |
3.6 s | 4.2 s | -14.5% |
❌ | python_array_optimize |
6.1 s | 6.9 s | -11.25% |
❌ | typeinfer |
408.7 ms | 433.5 ms | -5.72% |
I imagine some of the changes on shorter running benchmarks might not be significant - FWIW I just opened a PR to reduce our benchmarks to just the longer more meaningful ones (https://github.com/egraphs-good/egglog/pull/444). Otherwise, it seems like there is too much uncertainty introduced by the indeterminacy of the memory allocator.
The slowdown in the math-microbenchmark
is possibly more relevant.
We should try:
reference to ahash as fast hash map with ability to set seed https://docs.rs/ahash/latest/ahash/
reference to ahash as fast hash map with ability to set seed https://docs.rs/ahash/latest/ahash/
hashbrown
moved from ahash
to foldhash
in the past release, so that seems to be the new consensus. foldhash
has a fixed seed initializer too: foldhash::fast::FixedState::with_seed(0)
. For comparison, I switched symbol_table
(used by egglog) to use this.
Not sure why, but things are still non-deterministic with this change https://github.com/egraphs-good/eggcc/actions/runs/11525853271/job/32088922445?pr=637
Since I updated the hash function in PR#445 Update hashbrown, I'd recommend you rebase onto main. My goal there was to update to the latest versions of the hashing dependencies.
I didn't notice that egglog also uses rustc-hash, so that ought to be updated, since they recently did significant improvements improving both performance and hash strength. It's no longer the Firefox-derived hash function; its collision weaknesses, which were problematic for rustc, have been fixed. Now, rustc-hash is a mildly stripped down version of foldhash, making tradeoffs for usage in a compiler (I've written on the differences between them in my notes). It's probably only worth using one of the two.
Update: fixed in PR#456 Update symbol_table. Max's symbol_table
has some upstream changes which haven't been released to crates.io, so the version used by egglog is still on an old version of hashbrown and is also missing performance improvements contributed by someone else. For that, we should either ping him again or switch to a Git dependency on it in Cargo.toml.
One thought:
Another source of nondeterminism that can happen if you're running more than one instance of egglog in a single process is symbol_table. The numbers symbol_table hands out will not be deterministic (there are various process-global counters being incremented and the like. Different numbers will cause hashmap to store strings or symbols in different spots, hence iteration order will be different too.
@thaliaarchi thanks for the suggestions
@ezrosent hmm yeah, that sounds like it would do it. I assumed wrongly that symbol gen was deterministic.
Sounds like I should make a "nondeterministic" feature flag for now, and we can revisit this question at a later date. The best option is probably to somehow sort the resulting serialized egraph, since that would work even with parallel egglog.
Regression is caused by the swap from hashmap to index maps, which have a slower insertion and lookup speed. To recover the performance, use the nondeterministic flag.
Looking to the future, we'd like to remove the dependency on symbol gen library and use the fixed seed solution.
@ezrosent, I ended up having to use indexmap in generic join, which might slow down egglog. Should we just merge it for now? I would love to add testing to egglog that somehow catches nondeterminism as well.