apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
798 stars 267 forks source link

Performance of JSON manipulation #173

Open Geal opened 2 years ago

Geal commented 2 years ago

A large part of the router's work is done in deserializing, filtering, merging and serializing JSON data, so any work done in improving its performance will have a large impact. As is often the case with perf work, it boils down to:

I propose that we record here the perf issues we encounter, as low hanging fruit if someone has time to look into it:

Geal commented 2 years ago

zero copy deserialization

another thing we should explore in the future is zero copy deserialization. Assuming we have to store the entire graphql response from a subgraph in memory (which is the common case, since most fields will be returned to the clients), instead of parsing it entirely to an owned JSON struct, with allocations for everything (hashmaps, strings, etc), we could parse it to a structure that references slices of the input.

This greatly improves string handling: currently when a field is a string, we parse it, then unescape it to a String instance, that will then be reserialized. We could instead parse it, keep a reference to the slice, then write the slice to the ouput stream directly. This is doable right now with a Cow<'a, str> field and the #[serde(borrow)] attribute, but it still allocates if some characters are escaped. For fields that we just need to transmit we could even avoid unescaping.

references:

Geal commented 2 years ago

Investigate simd-json

Using simd can get us faster deserialization, and it's compatible with serde: https://github.com/simd-lite/simd-json

Geal commented 2 years ago

I now have a good idea of the way to implement the zero copy deserialization, and I think we should leverage the Bytes struct instead of raw slices. We can have the Bytes hold the entire subgraph response, and aggregate data from there in the client response, without caring much about lifetimes (Bytes instances are refcounted). And they will provide a nice way to plug into caching: we can store in the cache an object that has been reserialized to a very small buffer in memory (so we do not keep large subgraph response buffers for a long time) or in an external service (redis, memcached) that we can query, and will return a Bytes that we can use in the same way.