Open SimonSapin opened 1 year ago
Let’s call Symbol
this hypothetical type with Deref<Target = str>
and cheap Clone
.
Ideally to reduce copying we’d convert strings to Symbol
as early as possible (in the lexer) and keep them as such as long as they are needed. In the HIR we define data structures so we can choose to store Symbol
instead of String
.
The AST however is backed by Rowan, which has its own opinion about string storage. GreenNodeBuilder
takes a &str
and has an internal cache/interner. GreenToken
contains a rowan-private ThinArc
smart pointer.
I’m not sure how to reconcile both.
Deduplicating indentical strings is nice to save memory and to make ==
very fast, but perhaps not as important for GraphQL documents as it is for HTML element names in Servo. (When you routinely have many thousands of <div>
elements you’d rather not allocate the string "div"
that many times.)
A smart string type with reference counting (fast clone) and pre-computed hash (fast Hash
and fast path for ==
) would already bring a lot of benefits over the status quo, without the implementation complexity of full dedup.
The WIP mut
branch introduces a NodeStr
type which is similar to Arc<str>
, replacing many allocations and string copies with atomic integer operations.
We decided against pre-computed hashing, since hashing a pre-computed 8 bytes hash is not much less work than hashing names that are rarely dozens of bytes.
Deduplication is also not done. An explicit per-document interner would not work since a NodeStr
can be taken from one parsed document and reusing when mutating another. A global implicit interner would require non-trivial synchronization.
https://github.com/apollographql/apollo-rs/pull/868 changes the memory representation of NodeStr
/ Name
. We considered but rejected a process-global string interner that would String::leak
newly-inserted strings and give out &'static str
. However I think a variation of it is possible that gives out Arc<str>
, and stores a weak reference to let unused strings be deallocated. Either way, this interner would be synchronized by a single RwLock
. It remains to be seen whether the benefit of deduplicated strings is worth the synchronization cost of RwLock
. Benchmarking is not easy because this cost happens mostly under multi-threaded contention, so a synthetic benchmark may not be representative of real-world workloads.
Reopening to track ideas in case we want to reconsider this later.
I just came across this post again, about the cost of many threads cloning the same Arc
: https://pkolaczk.github.io/server-slower-than-a-laptop/
If the Router ever has a tight loop that clones a Name
from its schema, an Arc
-based string interner could potentially trigger this same performance problem. But maybe enough code manipulates &Name
without touching the atomic reference counter?
My impression is that apollo-federation today clones Name
s a lot a lot
The HIR in
apollo-compiler
usesString
a lot. Cloning, hashing, and comparing those strings repeatedly has cost that can add up. This can be solved by having some kind of string "interning" that deduplicates strings in memory, so that for example equality would be as cheap as a pointer comparison.Some secondary goals are:
AsRef<str>
,Eq
,Hash
, etc. That is, using them does not require a separate access a string pool / interner.With a quick search I found a number of interning crates that don’t do all of this. Servo’s string-cache does but comes with non-trivial complexity for features that are maybe not as useful here. (Namely: multiple static sets of "well-known" strings that can be interned without allocation and in
const
context, small string optimization)Perhaps something custom to apollo-rs would make sense. I think the design might involve atomic reference-counting, precomputed hash, and a process-global mutex-protected hash table. It could use https://crates.io/crates/weak-table, or something even simpler since we don’t need a weak counter (as it would always be 1). But maybe a full custom hash table is too much wheel-reinvention.