apollographql / apollo-rs

Spec compliant GraphQL Tools in Rust.
Apache License 2.0
577 stars 45 forks source link

String interning #385

Open SimonSapin opened 1 year ago

SimonSapin commented 1 year ago

The HIR in apollo-compiler uses String a lot. Cloning, hashing, and comparing those strings repeatedly has cost that can add up. This can be solved by having some kind of string "interning" that deduplicates strings in memory, so that for example equality would be as cheap as a pointer comparison.

Some secondary goals are:

With a quick search I found a number of interning crates that don’t do all of this. Servo’s string-cache does but comes with non-trivial complexity for features that are maybe not as useful here. (Namely: multiple static sets of "well-known" strings that can be interned without allocation and in const context, small string optimization)

Perhaps something custom to apollo-rs would make sense. I think the design might involve atomic reference-counting, precomputed hash, and a process-global mutex-protected hash table. It could use https://crates.io/crates/weak-table, or something even simpler since we don’t need a weak counter (as it would always be 1). But maybe a full custom hash table is too much wheel-reinvention.

SimonSapin commented 1 year ago

Let’s call Symbol this hypothetical type with Deref<Target = str> and cheap Clone.

Ideally to reduce copying we’d convert strings to Symbol as early as possible (in the lexer) and keep them as such as long as they are needed. In the HIR we define data structures so we can choose to store Symbol instead of String.

The AST however is backed by Rowan, which has its own opinion about string storage. GreenNodeBuilder takes a &str and has an internal cache/interner. GreenToken contains a rowan-private ThinArc smart pointer.

I’m not sure how to reconcile both.

SimonSapin commented 1 year ago

Deduplicating indentical strings is nice to save memory and to make == very fast, but perhaps not as important for GraphQL documents as it is for HTML element names in Servo. (When you routinely have many thousands of <div> elements you’d rather not allocate the string "div" that many times.)

A smart string type with reference counting (fast clone) and pre-computed hash (fast Hash and fast path for ==) would already bring a lot of benefits over the status quo, without the implementation complexity of full dedup.

SimonSapin commented 1 year ago

The WIP mut branch introduces a NodeStr type which is similar to Arc<str>, replacing many allocations and string copies with atomic integer operations.

We decided against pre-computed hashing, since hashing a pre-computed 8 bytes hash is not much less work than hashing names that are rarely dozens of bytes.

Deduplication is also not done. An explicit per-document interner would not work since a NodeStr can be taken from one parsed document and reusing when mutating another. A global implicit interner would require non-trivial synchronization.

SimonSapin commented 5 months ago

https://github.com/apollographql/apollo-rs/pull/868 changes the memory representation of NodeStr / Name. We considered but rejected a process-global string interner that would String::leak newly-inserted strings and give out &'static str. However I think a variation of it is possible that gives out Arc<str>, and stores a weak reference to let unused strings be deallocated. Either way, this interner would be synchronized by a single RwLock. It remains to be seen whether the benefit of deduplicated strings is worth the synchronization cost of RwLock. Benchmarking is not easy because this cost happens mostly under multi-threaded contention, so a synthetic benchmark may not be representative of real-world workloads.

Reopening to track ideas in case we want to reconsider this later.

SimonSapin commented 4 months ago

I just came across this post again, about the cost of many threads cloning the same Arc: https://pkolaczk.github.io/server-slower-than-a-laptop/

If the Router ever has a tight loop that clones a Name from its schema, an Arc-based string interner could potentially trigger this same performance problem. But maybe enough code manipulates &Name without touching the atomic reference counter?

goto-bus-stop commented 4 months ago

My impression is that apollo-federation today clones Names a lot a lot