Open Geal opened 2 years ago
another thing we should explore in the future is zero copy deserialization. Assuming we have to store the entire graphql response from a subgraph in memory (which is the common case, since most fields will be returned to the clients), instead of parsing it entirely to an owned JSON struct, with allocations for everything (hashmaps, strings, etc), we could parse it to a structure that references slices of the input.
This greatly improves string handling: currently when a field is a string, we parse it, then unescape it to a String instance, that will then be reserialized. We could instead parse it, keep a reference to the slice, then write the slice to the ouput stream directly.
This is doable right now with a Cow<'a, str>
field and the #[serde(borrow)]
attribute, but it still allocates if some characters are escaped. For fields that we just need to transmit we could even avoid unescaping.
references:
Using simd can get us faster deserialization, and it's compatible with serde: https://github.com/simd-lite/simd-json
I now have a good idea of the way to implement the zero copy deserialization, and I think we should leverage the Bytes struct instead of raw slices.
We can have the Bytes
hold the entire subgraph response, and aggregate data from there in the client response, without caring much about lifetimes (Bytes instances are refcounted).
And they will provide a nice way to plug into caching: we can store in the cache an object that has been reserialized to a very small buffer in memory (so we do not keep large subgraph response buffers for a long time) or in an external service (redis, memcached) that we can query, and will return a Bytes that we can use in the same way.
A large part of the router's work is done in deserializing, filtering, merging and serializing JSON data, so any work done in improving its performance will have a large impact. As is often the case with perf work, it boils down to:
I propose that we record here the perf issues we encounter, as low hanging fruit if someone has time to look into it:
serde_json::to_string_pretty
on the response even if trace level log is deactivated (I've seen that take 7% of CPU time on large responses with the fix from #172): https://github.com/apollographql/router/blob/main/crates/apollo-router-core/src/federated.rs#L256-L258 - fixed in #206deep_merge
takes a reference to a jsonValue
then callsto_owned
on it, could we instead pass an owned value, and let the caller decide if it wants to clone it? https://github.com/apollographql/router/blob/main/crates/apollo-router-core/src/json_ext.rs#L126-L132 (3.75% of CPU time with #172) testing in #132Bytes
, maybe we could serialize directly toBytes
or aWrite
instance: https://github.com/apollographql/router/blob/main/crates/apollo-router/src/warp_http_server_factory.rs#L290-L304