carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.25k stars 1.48k forks source link

Port the toolchain to use the new Carbon hashtable #4097

Closed chandlerc closed 3 months ago

chandlerc commented 3 months ago

This works to leverage the capabilities of the hashtable as much as possible, for example using the key context in the value stores. However, there may still be opportunities to refactor more deeply and use the functionality even better. Hopefully this is at least a reasonable start and gets us a clean baseline.

On an Arm M1, this is a 15% improvement on my large lexing stress test, but ends up a wash on my x86-64 server. This is a smaller benefit than I expected, and it's because we're using a set-of-IDs and looking up values with a key context for things like identifiers. This pattern has a surprising tradeoff. The new hashtable uses significantly less memory, a 10% peak RSS reduction just from the hashtable change. But indirecting through the vector of values makes growing the hashtable dramatically less cache-friendly: it causes growth to randomly access every key when rehashing. On x86, everything gained by the faster hashtable is lost in even slower growth. And even on Arm, this eats into the benefits.

But I have a plan to tweak how identifiers specifically work to avoid most of the growth, and so I suspect this is the right tradeoff on the whole. It gives us significant working set size reduction and we can likely avoid the regressed operation (growth with rehash) in most cases by clever reserving and if necessary by adding a hash caching layer to the table infrastructure.

jonmeow commented 3 months ago

Thanks! Still LGTM, note the clang-tidy failure though.