Closed SimonSapin closed 2 months ago
I’d like to do the change from Node<String>
to Node<str>
but it requires a change upstream: https://github.com/Manishearth/triomphe/pull/92
In today’s discussion there was clear reluctance about intentionally leaking memory, so I’ve reverted the addition of the leaking string interner. A future PR can consider adding a non-leaking interner based on Arc
and Weak
.
With a few small changes to make https://github.com/apollographql/router/pull/5477 compile and a changelog entry, this is now ready for review
Sorry! At some point I tried something in a separate branch and forgot to merge back. I didn’t realize I was not pushing to this PR anymore and asked for review on something quite a bit behind from what I thought. All pushed now.
Remaining to bikeshed:
new_*
v.s. from_*
)StaticOrArc
enumto_static_str_or_cloned_arc
method that returns itName
, InvalidNameError
, and StaticOrArc
. Is three items the point where they deserve their own public module? (The name!
macro would belong there too but #[macro_export]
currently always exports at the crate root)I haven't thought a lot about this but I think it makes sense to have an apollo_compiler::name::{Name, InvalidNameError, StaticOrArc}
module and a reexport of Name
at the root.
Per today’s discussion I removed the StaticOrArc
enum and split the conversion into two separate as_static_str()
and to_cloned_arc()
methods returning options. It means conversion to OpenTelementry string needs an unwrap: https://github.com/apollographql/router/pull/5352#issuecomment-2176472531
This is the result of a series of experiments. I’m not sure I like it, but let’s discuss it.
TL;DR
NodeStr
is no more. GraphQL string values use a plainString
(theValue
enum is already in aNode
with the same source span) and descriptions useNode<str>
.Name
has a dual representation based on eitherArc<str>
or&'static str
and can cheaply convert to that.Motivation
Apollo Router has found a performance bottleneck when using GraphQL type and field names in OpenTelemetry.
opentelemetry::StringValue
can be converted cheaply fromString
,Arc<str>
, or&'static str
butapollo_compiler::NodeStr
(usually) contains none of these so converting requires a new heap allocation. This happens in a hot path often enough that the allocation (and later deallocation) costs add up.Name
andNodeStr
in apollo-compiler 1.0.0-beta.17Name
is used for GraphQL names. It dereferences toNodeStr
and is a thin wrapper that adds checking for GraphQL name syntax. (ASCII alphanumeric or underscore, not starting with a digit)In addition to names,
NodeStr
is also used for GraphQL stringValue
s and descriptions. It contains an immutablestr
value and optional source location (file ID and text range) for diagnostics.Internally,
NodeStr
contains a singleptr::NonNull<()>
with a tag in its lowest bit. The pointer is usually to a (triomphe::ThinArc
) shared heap allocation that contains:The pointer can instead (discriminated with the pointer tag) represent
&'static &'static str
. This is a double indirection, because&'static str
is a wide pointer that stores the string length next to its address but we didn’t want to double the size ofNodeStr
. In this case there is no location information. This is used by thename!("example")
macro which does GraphQL name syntax checking at compile-time then callsNodeStr::from_static(&"example")
with an extra&
which gets promoted to a constant.A quality of this design is that
size_of::<NodeStr>()
is only 8 bytes.First idea:
String::leak
GraphQL type and field names in a schema rarely change. What if we accepted a small memory leak to turn them into
&'static str
?However that’s not enough to use the existing static representation of
NodeStr
:String::leak
doesn’t give us the double&'static &'static str
indirection. We could potentially allocateBox<&'static str>
and also leak that, but even then…NodeStr
doesn’t.Overall direction for a new memory representation
If we’re reconsidering the design of
NodeStr
, we might as well pick one that makes conversion toopentelemetry::StringValue
cheap in all cases. This means a dual representation based on eitherArc<str>
or&'static str
. (The standard libraryArc
allocates space for a weak reference counter that we don’t use, but we can accept that cost. Also see below for aWeak
come-back.)This in turn means moving the source location outside of the allocation, to be inline in the
NodeStr
struct. At first this seems like a loss: if a string is cloned many times, its source location (or even space to record the lack thereof) is duplicated. But maybe that doesn’t happen that much? On the contrary, inline location could allow strings at different locations to share a heap allocation (hint hint more foreshadowing).A straightforward
(enum { Arc<str>, &'static str }, Option<NodeLocation>)
hassize_of() == 40
bytes, which feels like too much.Layout optimizations
rowan::TextRange
insideNodeLocation
uses 32-bit offsets, so apollo-parser is already limited to 4 GiB inputsNodeLocation
can be decomposed into{ file_id, start_offset: u32, end_offset: u32 }
.Both
Arc<str>
and&'static str
are represented as{ data: *const u8, len: usize }
. The length is kind of already represented in the location is the difference between end and start offsets. Accessing the length to build a&str
is much more common than accessing the end offset, so we can store{ start_offset: u32, len: u32 }
and compute the end offset on demand. When a synthetic string doesn’t have a source location, we can setstart_offset
to zero but still uselen
.Finally, an actual Rust
enum
would allocate 64 bits to the 1-bit discriminant in order preserve memory alignment. Like before, we can stash that bit somewhere else:&'static str
hasalign_of() == 1
(it could be a sub-slice of some other string) so we can’t use the lower pointer bit for the tagstart_offset
would work, but bring the existing 4 GiB file input limit to 2 GiB. This could be ok but we can avoid itlen
would move that 2 GiB limit to the length of a single string sounds fine, but extra steps in the common code path of rebuilding a&str
can easily be avoidedFileId
contains aNonZeroI64
, we can easily spare a bit thereWe end up with this representation which has
size_of() == 24
bytes:Consequences and tradeoffs
Remember when the length of the string is kind of already in the source location? Only kind of. This is the case for
Name
whose value appear exactly in the GraphQL source, but string values and descriptions are quoted strings that can use backslash-escape sequences. Both of these make the source span longer than the decoded value.This PR chooses to remove
NodeStr
entirely and move the special representation directly intoName
. GraphQL string values use a plainString
(theValue
enum is already in aNode
with the same source span) and descriptions useNode<String>
Node<str>
Clippy started complaining because
executable::BuildError
was now hadsize_of() == 264
bytes. I boxed its largest variants to bring that to 104 bytes, but the impact doesn’t stop there. EveryName
will now incur an extra 16 bytes of inline size. On the other hand each heap allocation forName
will have 16 fewer bytes, so I’d guess the overall impact on memory use of a running Router is negligible.FileId
containedNonZeroI64
with -1 and -2 reserved for special values.TaggedFileId
was easier to make without negative values, so nowNonZeroU64
is used and "normal" IDs start at 3 instead of 1. This causes a diff for every snapshot test that prints file IDs, but this is a one-time transition.Optional: leaking string interner
The above gives cheap conversion to
opentelemetry::StringValue
already, so we could stop there. But the originalString::leak
idea is more doable now. But if we’re gonna leak we should definitely deduplicated the leaked strings.This PR includes (in a separate commit)(edit: this inclusion was reverted for now) an opt-in process-wide string interner based onOnceLock<RwLock<hashbrown::RawTable<&'static str>>>
andString::leak
. It’s enabled by configuringParser::new().intern_and_leak_names()
. When not enabled, the parser does not leak any new string but still looks in the interner for strings already there. Sointern_and_leak_names
could for example be enabled when parsing a schema but not when parsing executable document, still allowing the latter to reuse already-interned field names.Leaking is enabled for parsing
built_in.graphql
, so strings likeInt
or__typename
are always interned.Going further: weak reference string interner
Leaking memory seems risky. Another possibility is to store
std::sync::Weak<str>
in the interner and give outArc<str>
on lookup. This would allow strings not references anymore to be de-allocated.Before inserting into the hash table, the interner would check if it’s about to grow with
table.capacity() == table.len()
. In that case (since we’re about to do anO(n)
reallocation of the table anyway) it would scan the table to remove entries withWeak::strong_count() == 0
since those have been deallocated and can’t be used anymore.A potential risk is atomic contention: atomic operations in
Arc
fast when uncontended but if many threads are all trying to increment and decrement the refcount of the oneArc<str>
representing"Query"
then they will start waiting on each other.