apollographql / apollo-rs

Spec compliant GraphQL Tools in Rust.
Apache License 2.0
566 stars 43 forks source link

New memory representation for Name/NodeStr #868

Closed SimonSapin closed 2 months ago

SimonSapin commented 3 months ago

This is the result of a series of experiments. I’m not sure I like it, but let’s discuss it.

TL;DR

NodeStr is no more. GraphQL string values use a plain String (the Value enum is already in a Node with the same source span) and descriptions use Node<str>. Name has a dual representation based on either Arc<str> or &'static str and can cheaply convert to that.

Motivation

Apollo Router has found a performance bottleneck when using GraphQL type and field names in OpenTelemetry. opentelemetry::StringValue can be converted cheaply from String, Arc<str>, or &'static str but apollo_compiler::NodeStr (usually) contains none of these so converting requires a new heap allocation. This happens in a hot path often enough that the allocation (and later deallocation) costs add up.

Name and NodeStr in apollo-compiler 1.0.0-beta.17

Name is used for GraphQL names. It dereferences to NodeStr and is a thin wrapper that adds checking for GraphQL name syntax. (ASCII alphanumeric or underscore, not starting with a digit)

In addition to names, NodeStr is also used for GraphQL string Values and descriptions. It contains an immutable str value and optional source location (file ID and text range) for diagnostics.

Internally, NodeStr contains a single ptr::NonNull<()> with a tag in its lowest bit. The pointer is usually to a (triomphe::ThinArc) shared heap allocation that contains:

The pointer can instead (discriminated with the pointer tag) represent &'static &'static str. This is a double indirection, because &'static str is a wide pointer that stores the string length next to its address but we didn’t want to double the size of NodeStr. In this case there is no location information. This is used by the name!("example") macro which does GraphQL name syntax checking at compile-time then calls NodeStr::from_static(&"example") with an extra & which gets promoted to a constant.

A quality of this design is that size_of::<NodeStr>() is only 8 bytes.

First idea: String::leak

GraphQL type and field names in a schema rarely change. What if we accepted a small memory leak to turn them into &'static str?

However that’s not enough to use the existing static representation of NodeStr:

Overall direction for a new memory representation

If we’re reconsidering the design of NodeStr, we might as well pick one that makes conversion to opentelemetry::StringValue cheap in all cases. This means a dual representation based on either Arc<str> or &'static str. (The standard library Arc allocates space for a weak reference counter that we don’t use, but we can accept that cost. Also see below for a Weak come-back.)

This in turn means moving the source location outside of the allocation, to be inline in the NodeStr struct. At first this seems like a loss: if a string is cloned many times, its source location (or even space to record the lack thereof) is duplicated. But maybe that doesn’t happen that much? On the contrary, inline location could allow strings at different locations to share a heap allocation (hint hint more foreshadowing).

A straightforward (enum { Arc<str>, &'static str }, Option<NodeLocation>) has size_of() == 40 bytes, which feels like too much.

Layout optimizations

rowan::TextRange inside NodeLocation uses 32-bit offsets, so apollo-parser is already limited to 4 GiB inputs NodeLocation can be decomposed into { file_id, start_offset: u32, end_offset: u32 }.

Both Arc<str> and &'static str are represented as { data: *const u8, len: usize }. The length is kind of already represented in the location is the difference between end and start offsets. Accessing the length to build a &str is much more common than accessing the end offset, so we can store { start_offset: u32, len: u32 } and compute the end offset on demand. When a synthetic string doesn’t have a source location, we can set start_offset to zero but still use len.

Finally, an actual Rust enum would allocate 64 bits to the 1-bit discriminant in order preserve memory alignment. Like before, we can stash that bit somewhere else:

We end up with this representation which has size_of() == 24 bytes:

struct {
    ptr: NonNull<u8>,
    len: u32,
    start_offset: u32,
    tagged_file_id: TaggedFileId,
}

Consequences and tradeoffs

Optional: leaking string interner

The above gives cheap conversion to opentelemetry::StringValue already, so we could stop there. But the original String::leak idea is more doable now. But if we’re gonna leak we should definitely deduplicated the leaked strings.

This PR includes (in a separate commit) (edit: this inclusion was reverted for now) an opt-in process-wide string interner based on OnceLock<RwLock<hashbrown::RawTable<&'static str>>> and String::leak. It’s enabled by configuring Parser::new().intern_and_leak_names(). When not enabled, the parser does not leak any new string but still looks in the interner for strings already there. So intern_and_leak_names could for example be enabled when parsing a schema but not when parsing executable document, still allowing the latter to reuse already-interned field names.

Leaking is enabled for parsing built_in.graphql, so strings like Int or __typename are always interned.

Going further: weak reference string interner

Leaking memory seems risky. Another possibility is to store std::sync::Weak<str> in the interner and give out Arc<str> on lookup. This would allow strings not references anymore to be de-allocated.

Before inserting into the hash table, the interner would check if it’s about to grow with table.capacity() == table.len(). In that case (since we’re about to do an O(n) reallocation of the table anyway) it would scan the table to remove entries with Weak::strong_count() == 0 since those have been deallocated and can’t be used anymore.

A potential risk is atomic contention: atomic operations in Arc fast when uncontended but if many threads are all trying to increment and decrement the refcount of the one Arc<str> representing "Query" then they will start waiting on each other.

SimonSapin commented 2 months ago

I’d like to do the change from Node<String> to Node<str> but it requires a change upstream: https://github.com/Manishearth/triomphe/pull/92

SimonSapin commented 2 months ago

In today’s discussion there was clear reluctance about intentionally leaking memory, so I’ve reverted the addition of the leaking string interner. A future PR can consider adding a non-leaking interner based on Arc and Weak.

SimonSapin commented 2 months ago

With a few small changes to make https://github.com/apollographql/router/pull/5477 compile and a changelog entry, this is now ready for review

SimonSapin commented 2 months ago

Sorry! At some point I tried something in a separate branch and forgot to merge back. I didn’t realize I was not pushing to this PR anymore and asked for review on something quite a bit behind from what I thought. All pushed now.

Remaining to bikeshed:

goto-bus-stop commented 2 months ago

I haven't thought a lot about this but I think it makes sense to have an apollo_compiler::name::{Name, InvalidNameError, StaticOrArc} module and a reexport of Name at the root.

SimonSapin commented 2 months ago

Per today’s discussion I removed the StaticOrArc enum and split the conversion into two separate as_static_str() and to_cloned_arc() methods returning options. It means conversion to OpenTelementry string needs an unwrap: https://github.com/apollographql/router/pull/5352#issuecomment-2176472531