egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
459 stars 54 forks source link

Update hashbrown #445

Closed thaliaarchi closed 1 month ago

thaliaarchi commented 1 month ago

hashbrown 0.15 was released this month, which notably removed the RawTable API in favor of HashTable and changed its hasher from ahash to the faster foldhash.

Besides this, Max's symbol_table still requires an older version of hashbrown in the lockfile. I have a draft to update that and could rebase to include it, but didn't figure out how to run its Criterion benchmarks.

codspeed-hq[bot] commented 1 month ago

CodSpeed Performance Report

Merging #445 will not alter performance

Comparing thaliaarchi:update-hashbrown (90e6e69) with main (b0db068)

Summary

✅ 6 untouched benchmarks

Alex-Fischman commented 1 month ago

+1 to not letting "performance regressions" in the parser block this PR

thaliaarchi commented 1 month ago

I've pushed a commit, which replaces symbol_table with my fork that bumps their version of hashbrown. It's a hack to run benchmarks in CI. I'll then submit a PR to symbol_table, depending on the results, and rebase here.

It looks like running benchmarks isn't automatic; could you trigger another run?

saulshanabrook commented 1 month ago

@thaliaarchi I believe the benchmarks have run on your most recent push! The benchmark comment gets updated whenever a new run is processed in this branch.

It seems like the only regression is in cykjson. I wouldn't personally let that block merging in this PR, but I am not very familiar with what that example is for.

yihozhang commented 1 month ago

cykjson is a small cool egglog example that does the CYK parsing algorithm of JSON-like strings. It is a more Datalog-like workload (dynamic programming) with some e-class manipulations

thaliaarchi commented 1 month ago

I dropped the commit for benchmarking updating symbol_table. If there's a response there later, it can be updated here separately.

I also changed fn TermDag::get(&self, id: TermId) -> Term to return &Term instead. All internal usages use the result by reference, so cloning is unnecessary. Does this affect anything externally?

Alex-Fischman commented 1 month ago

I also changed fn TermDag::get(&self, id: TermId) -> Term to return &Term instead. All internal usages use the result by reference, so cloning is unnecessary. Does this affect anything externally?

This is good, egglog is unstable and users can clone if they need it.

saulshanabrook commented 1 month ago

Thanks @thaliaarchi for working on this and responding to all the feedback! If you have anything to add, we are also discussing the tradeoffs with hash performance and determinism in this post: https://github.com/egraphs-good/egglog/pull/439#issuecomment-2435724938

EDIT: It looks like these changes also caused a 7% speedup in the biggest benchmark (added to main after this PR was started, so wasn't included in the comparison here), which is pretty nice! https://codspeed.io/egraphs-good/egglog/runs/671a868380493f6bc05c7bfc

thaliaarchi commented 1 month ago

@saulshanabrook Thanks! I'm glad to see such speedups!