New memory representation for Name/NodeStr

SimonSapin commented 3 months ago

This is the result of a series of experiments. I’m not sure I like it, but let’s discuss it.

TL;DR

NodeStr is no more. GraphQL string values use a plain String (the Value enum is already in a Node with the same source span) and descriptions use Node<str>. Name has a dual representation based on either Arc<str> or &'static str and can cheaply convert to that.

Motivation

Apollo Router has found a performance bottleneck when using GraphQL type and field names in OpenTelemetry. opentelemetry::StringValue can be converted cheaply from String, Arc<str>, or &'static str but apollo_compiler::NodeStr (usually) contains none of these so converting requires a new heap allocation. This happens in a hot path often enough that the allocation (and later deallocation) costs add up.

`Name` and `NodeStr` in apollo-compiler 1.0.0-beta.17

Name is used for GraphQL names. It dereferences to NodeStr and is a thin wrapper that adds checking for GraphQL name syntax. (ASCII alphanumeric or underscore, not starting with a digit)

In addition to names, NodeStr is also used for GraphQL string Values and descriptions. It contains an immutable str value and optional source location (file ID and text range) for diagnostics.

Internally, NodeStr contains a single ptr::NonNull<()> with a tag in its lowest bit. The pointer is usually to a (triomphe::ThinArc) shared heap allocation that contains:

Atomic reference counter
Optional source location
String length
String bytes (variable length, inline following the header)

The pointer can instead (discriminated with the pointer tag) represent &'static &'static str. This is a double indirection, because &'static str is a wide pointer that stores the string length next to its address but we didn’t want to double the size of NodeStr. In this case there is no location information. This is used by the name!("example") macro which does GraphQL name syntax checking at compile-time then calls NodeStr::from_static(&"example") with an extra & which gets promoted to a constant.

A quality of this design is that size_of::<NodeStr>() is only 8 bytes.

First idea: `String::leak`

GraphQL type and field names in a schema rarely change. What if we accepted a small memory leak to turn them into &'static str?

However that’s not enough to use the existing static representation of NodeStr:

String::leak doesn’t give us the double &'static &'static str indirection. We could potentially allocate Box<&'static str> and also leak that, but even then…
For diagnostics we want type and field names in the schema to carry source location, which the current static NodeStr doesn’t.

Overall direction for a new memory representation

If we’re reconsidering the design of NodeStr, we might as well pick one that makes conversion to opentelemetry::StringValue cheap in all cases. This means a dual representation based on either Arc<str> or &'static str. (The standard library Arc allocates space for a weak reference counter that we don’t use, but we can accept that cost. Also see below for a Weak come-back.)

This in turn means moving the source location outside of the allocation, to be inline in the NodeStr struct. At first this seems like a loss: if a string is cloned many times, its source location (or even space to record the lack thereof) is duplicated. But maybe that doesn’t happen that much? On the contrary, inline location could allow strings at different locations to share a heap allocation (hint hint more foreshadowing).

A straightforward (enum { Arc<str>, &'static str }, Option<NodeLocation>) has size_of() == 40 bytes, which feels like too much.

Layout optimizations

rowan::TextRange inside NodeLocation uses 32-bit offsets, so apollo-parser is already limited to 4 GiB inputs NodeLocation can be decomposed into { file_id, start_offset: u32, end_offset: u32 }.

Both Arc<str> and &'static str are represented as { data: *const u8, len: usize }. The length is kind of already represented in the location is the difference between end and start offsets. Accessing the length to build a &str is much more common than accessing the end offset, so we can store { start_offset: u32, len: u32 } and compute the end offset on demand. When a synthetic string doesn’t have a source location, we can set start_offset to zero but still use len.

Finally, an actual Rust enum would allocate 64 bits to the 1-bit discriminant in order preserve memory alignment. Like before, we can stash that bit somewhere else:

&'static str has align_of() == 1 (it could be a sub-slice of some other string) so we can’t use the lower pointer bit for the tag
The upper bit of start_offset would work, but bring the existing 4 GiB file input limit to 2 GiB. This could be ok but we can avoid it
The upper bit of len would move that 2 GiB limit to the length of a single string sounds fine, but extra steps in the common code path of rebuilding a &str can easily be avoided
FileId contains a NonZeroI64, we can easily spare a bit there

We end up with this representation which has size_of() == 24 bytes:

struct {
    ptr: NonNull<u8>,
    len: u32,
    start_offset: u32,
    tagged_file_id: TaggedFileId,
}

Consequences and tradeoffs

Remember when the length of the string is kind of already in the source location? Only kind of. This is the case for Name whose value appear exactly in the GraphQL source, but string values and descriptions are quoted strings that can use backslash-escape sequences. Both of these make the source span longer than the decoded value.

This PR chooses to remove NodeStr entirely and move the special representation directly into Name. GraphQL string values use a plain String (the Value enum is already in a Node with the same source span) and descriptions use ~~Node<String>~~ Node<str>
Clippy started complaining because executable::BuildError was now had size_of() == 264 bytes. I boxed its largest variants to bring that to 104 bytes, but the impact doesn’t stop there. Every Name will now incur an extra 16 bytes of inline size. On the other hand each heap allocation for Name will have 16 fewer bytes, so I’d guess the overall impact on memory use of a running Router is negligible.
FileId contained NonZeroI64 with -1 and -2 reserved for special values. TaggedFileId was easier to make without negative values, so now NonZeroU64 is used and "normal" IDs start at 3 instead of 1. This causes a diff for every snapshot test that prints file IDs, but this is a one-time transition.

Optional: leaking string interner

The above gives cheap conversion to opentelemetry::StringValue already, so we could stop there. But the original String::leak idea is more doable now. But if we’re gonna leak we should definitely deduplicated the leaked strings.

~~This PR includes (in a separate commit)~~ (edit: this inclusion was reverted for now) an opt-in process-wide string interner based on OnceLock<RwLock<hashbrown::RawTable<&'static str>>> and String::leak. It’s enabled by configuring Parser::new().intern_and_leak_names(). When not enabled, the parser does not leak any new string but still looks in the interner for strings already there. So intern_and_leak_names could for example be enabled when parsing a schema but not when parsing executable document, still allowing the latter to reuse already-interned field names.

Leaking is enabled for parsing built_in.graphql, so strings like Int or __typename are always interned.

Going further: weak reference string interner

Leaking memory seems risky. Another possibility is to store std::sync::Weak<str> in the interner and give out Arc<str> on lookup. This would allow strings not references anymore to be de-allocated.

Before inserting into the hash table, the interner would check if it’s about to grow with table.capacity() == table.len(). In that case (since we’re about to do an O(n) reallocation of the table anyway) it would scan the table to remove entries with Weak::strong_count() == 0 since those have been deallocated and can’t be used anymore.

A potential risk is atomic contention: atomic operations in Arc fast when uncontended but if many threads are all trying to increment and decrement the refcount of the one Arc<str> representing "Query" then they will start waiting on each other.

SimonSapin commented 2 months ago

I’d like to do the change from Node<String> to Node<str> but it requires a change upstream: https://github.com/Manishearth/triomphe/pull/92

SimonSapin commented 2 months ago

In today’s discussion there was clear reluctance about intentionally leaking memory, so I’ve reverted the addition of the leaking string interner. A future PR can consider adding a non-leaking interner based on Arc and Weak.

SimonSapin commented 2 months ago

With a few small changes to make https://github.com/apollographql/router/pull/5477 compile and a changelog entry, this is now ready for review

SimonSapin commented 2 months ago

Sorry! At some point I tried something in a separate branch and forgot to merge back. I didn’t realize I was not pushing to this PR anymore and asked for review on something quite a bit behind from what I thought. All pushed now.

Remaining to bikeshed:

Names of constructors (new_* v.s. from_*)
Name of the StaticOrArc enum
Name of the to_static_str_or_cloned_arc method that returns it
Import path of Name, InvalidNameError, and StaticOrArc. Is three items the point where they deserve their own public module? (The name! macro would belong there too but #[macro_export] currently always exports at the crate root)

goto-bus-stop commented 2 months ago

I haven't thought a lot about this but I think it makes sense to have an apollo_compiler::name::{Name, InvalidNameError, StaticOrArc} module and a reexport of Name at the root.

SimonSapin commented 2 months ago

Per today’s discussion I removed the StaticOrArc enum and split the conversion into two separate as_static_str() and to_cloned_arc() methods returning options. It means conversion to OpenTelementry string needs an unwrap: https://github.com/apollographql/router/pull/5352#issuecomment-2176472531

apollographql / apollo-rs