Performance Umbrella Issue

Poincare commented 7 years ago

This issue is meant to describe current tasks related to improving Apollo Client store performance and serve as a summary about the information we currently know about Apollo Client's perf. characteristics.

The primary issue that we are concerned with at the moment is store read performance. This is probably the operation that Apollo Client performs the most since apps don't write new results or send new queries over the network that often. We know a few pieces of information about the store read performance as measured through the benchmarks.

No-fetch query vs diffQueryAgainstStore

There are two ways we can look at cache read performance. We can either fire a client.query({ noFetch: true }) call or we can directly call diffQueryAgainstStore. The former also counts the time it takes to set up all of the surrounding logic (e.g. set up and tear down of the ObservableQuery, delivering the result to the query observers, etc.) whereas the latter does not.

So, we if we compare these on reading a GraphQL result that contains a list of objects out of the store, we can tell whether most of the time is spent on denormalization or in just setting up the stuff surrounding query. Results:

graph

As evident from above, query takes ~10-15x the time as diffQueryAgainstStore to read a result with the same number of associated objects. This probably implies that our logic within QueryManager and ObservableQuery can be improved and also probably means that the de-normalization that lives inside diffQueryAgainstStore isn't the primary culprit, especially for the counts of items considered in the above plot.

REC

As mentioned in #1242, referential equality checking (REC) imposes some cost on the store read times as measured through a no-fetch query. Here are the reproduced results:

graph

The blue line is with REC and the pink line is without. We can also see that resultMapper and addPreviousResultToIdValues (both tied to REC, afaik) play a significant role in the CPU profile for the noFetch query benchmark:

cpu profile

I'm not sure if this perf. hit is expected or if it's a consequence of an implementation bug. If it is the former, we can just offer an option to allow the application developer to turn off REC. That way, the tradeoff between preventing a React re-render and going through REC can be made at the application level.

Tasks

Investigate why a noFetch query takes so long in comparison to diffQueryAgainstStore and make it better (currently working on this)
Figure out why REC takes time
(Maybe?) Attempt to improve graphql-anywhere performance for diffQueryAgainstStore

Comments would be great.

wmertens commented 7 years ago

I must say this is surprising to me, in my testing the flamegraph showed that a lot of time was being spent rebuilding the query from cache.

I had a quick look at the benchmarks but could not immediately figure out what they were doing.

It would be nice if the benchmarks included a configuration as follows:

a query returning an array of 500 small objects without unique id like {index: $i, text: $loremIpsum + ${i}}
a query returning an array of 500 large objects with unique id, including unique sub objects and arrays without ids {id, subObj: {a: 5, b: "meep", c: $i}, links: [$i + 3, $i - 3, $i * 2], combo: [{a: $i}, {a: $i*2}, {a: $i*3}], lotsOfText: $loremIpsum + $i}
50 queries with variable $itemId returning smallish objects $itemId => ({id: $itemId, ts: Date.now(), title:$itemId title, content: $loremIpsum})

Poincare commented 7 years ago

@wmertens The benchmarks currently implement the first out of the three you mention. What would be the difference we'd expect to see between the first and the second or the first and the third? It seems that the third would have more to do with updating query observers than actual store read time.

wmertens commented 7 years ago

Those are the 3 use cases in my app :)

However, the first 2 don't really get re-run, their base components are always mounted. So it's the 3rd one that is the killer.

Come to think of it, the third case is not actually correct. In my app, the 3rd case actually augments the data from the 2nd case, requesting extra data for the same object ids. So even though they don't request much themselves, they do act on big objects, not sure if that is related?

Poincare commented 7 years ago

So, with the third case, do you have 50 active query observers that get delivered an update? Is that the part that's taking a long time? If so, there's a benchmark for that as well. Here's what the results look like:

results

As you can see, this is a totally separate problem from store reads. It turns out that the amount of time it takes to deliver updates to "n" subscribers scales super linearly in "n". Do you think this is related to the issue within your app?

wmertens commented 7 years ago

Actually, I only have 20, and they take 2s to update :-/

On Mon, Mar 13, 2017, 5:49 PM Dhaivat Pandya notifications@github.com wrote:

So, with the third case, do you have 50 active query observers that get delivered an update? Is that the part that's taking a long time? If so, there's a benchmark for that https://github.com/apollographql/apollo-client/blob/master/benchmark/index.ts#L128 as well. Here's what the results look like:

[image: results] https://camo.githubusercontent.com/a599c2ec6c99bb9a2b0afd82c3221d35f1bdbde1/68747470733a2f2f706c6f742e6c792f2537454468616976617450616e6479612f32332e706e67

As you can see, this is a totally separate problem from store reads. It turns out that the amount of time it takes to deliver updates to "n" subscribers scales super linearly in "n". Do you think this is related to the issue within your app?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apollographql/apollo-client/issues/1409#issuecomment-286168523, or mute the thread https://github.com/notifications/unsubscribe-auth/AADWlkaCIz-nlIeUcufJ6TmGci2Z3j7Oks5rlXOtgaJpZM4Ma3xM .

Poincare commented 7 years ago

Hm. I honestly have no idea why that's happening in your specific case - if you can write a short benchmark that captures the behavior you're running into and produces similar times, I'd be happy to look into it more deeply.

nss213 commented 7 years ago

So I think my data structures sound similar to @wmertens and I'm also experiencing data perf problems with diffQueryAgainstStore. I have fiddled with the benchmarks but haven't got them to demonstrate the issue. In the browser, however, I'm delivering 3 queries of about 20-30 objects each. Those objects each have 10-20 subobjects in an array field, and those subobjects have 4 string fields. One thing that improved performance for me was to change my graphql schema to expose the subobjects as 'scalar' JSONObjects rather than being a typed GraphQLObject. screen shot 2017-03-17 at 12 43 58 pm

helfer commented 7 years ago

@nss213 that's interesting. How much of a difference did you observe with scalars vs. nested objects?

nss213 commented 7 years ago

@helfer It's a rough measurement, but I'm seeing 200-400ms for my test with the typed subobject and 150-200ms with the JSONType for the subobject. So perhaps we are seeing a symptom but not the root cause. The top-level object which resides in an array is still typed. I think perhaps we're seeing something similar to @michalkvasnicak in issue #1342

Poincare commented 7 years ago

Hmm. The fact that the scalar JSON objects helped probably means that the denormalization is slow but in a different way than what the benchmarks are currently measuring. Could you give us a copy of the queries and an idea of what the results look like?

Secondly, it seems like diffQueryAgainstStore is being run twice on a single call of fetchQuery. I don't recall if this is intended behavior or not; @helfer?

It would also be great to see what your attempts at producing a benchmark have lead to. What do the results look like if you change the diffQueryAgainstStore benchmark to something that reads the kind of object you're describing?

nss213 commented 7 years ago

So far I've taken my entire response body and slammed it into the benchmark in place of the "hotel reservations" -- it doesn't seem to make the benchmark any slower for diffQueryAgainstStore but I can get the actual query benchmark to take 18ms or so (in node 6). I think we need to be able to run the benchmark in the browser -- not sure what config is needed to do that. Is that something you have done?

My guess is diffQueryAgainstStore is getting run twice because there are probably 2 listeners for the query.

I'll get you some numbers and more detail once I put a test together.

Poincare commented 7 years ago

There isn't a slick way to run the benchmark in browser yet, but I expect that just linking the compiled equivalent of benchmark/index.ts in an HTML document might do the trick.

nss213 commented 7 years ago

Ok, well I wasn't able to get those running in a short amount of time so I just cobbled together a browser-based test based on some webpack boilerplate. I assume there are much better ways of putting these things together, but I'm not up to speed on all the community libs. https://github.com/nss213/apollo-perf-test You can see the test in app/main.js, app/test.js, and the data in app/data.js. Just do the usual npm install and npm start

Use the chrome profiling tools to make the flame graph -- it looks very similar to the one I posted earlier from my own app.

There are a few interesting things to note here:

I've not setup any listeners or anything, just executed the query like the benchmark does, and we sometimes see multiple calls to diffQueryAgainstStore for single call to broadcastQueries and sometimes we see multiple calls to broacastQueries as well.
for the lifetime of a single ApolloClient in the test, reading the query from the store is slower each time -- my real app also shows this behavior
diffQueryAgainsStore seems to get slower and slower as well.

Now, it's totally possible that I wrote the test incorrectly, but this seems to match my actual application. Let me know if this helps, or what else I can do.

helfer commented 7 years ago

Thanks for the flame graph @nss213. It seems to indicate that diffQueryAgainstStore is a problem after all, so if we can't make that more performant, we'll have to look into other ways of propagating cache updates, for example by tracking dependency paths of a query in Apollo Client.

Poincare commented 7 years ago

Thanks so much for the flame graph! This helps a lot.

I have a hunch that this must be an implementation issue within diffQueryAgainstStore because in an idealized implementation, the denormalization execution time should be completely independent of the store size and number of queries executed but this clearly doesn't seem to be the case. I'll look into this ASAP.

nss213 commented 7 years ago

Glad I could help - If you will be using the test I gave up above, I do suggest reviewing it to make sure this graph isn't an artifact of a faulty test :)

Poincare commented 7 years ago

@nss213 The test does seem to be valid to me. I'm running into some very weird behavior, however. If I run a test for reading a single object with 500 subjects on Safari, I get the following results:

safari results

As you can see, the read times are quite reasonable in magnitude and don't increase over the tests. On the other hand, this is the same test on Chrome:

chrome results

And there we see the horrible read times and an increase in the read times. I'm not sure what exactly this means in terms of the solution (maybe something GC related?) but it is certainly thought provoking. Any ideas would be appreciated.

helfer commented 7 years ago

That is indeed very intriguing @Poincare!

nss213 commented 7 years ago

the read times are quite reasonable in magnitude

IMO 10 to 80 ms is still too slow since multiple queries will add up and tie up the CPU for that entire time. 200ms will be clear, perceptible stutter in the UI.

I'm indeed able to replicate the difference in perf with Safari and Chrome that you are seeing.

Poincare commented 7 years ago

After spending some time adding console.log's in the various function calls, I believe I have an explanation for what is happening.

The list this.queryListeners within QueryManager continues to grow for each query that is fired which gives us the increasing query read times. This occurs because of this particular block within ObservableQuery. We're using setTimeout in order to execute the call just after the end of function but Chrome seems to decide to execute this call much later and the unsubscribe call isn't processed until much later. On the other hand, Safari handles these calls on a much shorter timespan. So, on Chrome the queryListeners array grows quickly whereas it does not Safari which gives us the different performance behaviors.

I'm not familiar enough with the setTimeout(() => { ... }, 0) pattern to be able to tell if this should happen but is there some way we could replace that something like Go's defer statement which is guaranteed to execute after the end of the function? Alternatively, we can have a "listener reaper" that comes along every few cycles and removes query listeners that are no longer in use.

@nss213 I agree that 10-80ms is still quite slow but I believe we can improve on these running times with smaller optimizations, e.g. turning off REC (which seems to take up ~25% of total running time on those particular tests). Those are probably a separate set of issues than the one I describe above in this comment.

helfer commented 7 years ago

@Poincare can you link to the block within ObservableQuery that you're referring to?

Poincare commented 7 years ago

I did - see "this particular block" in the text above. Markdown error was preventing it from being rendered as a link earlier.

nss213 commented 7 years ago

@Poincare that's some excellent analysis; the story makes sense. Might you be able to get away with unsubscribing without a timeout altogether? Does the unsubscription actually have to wait until the this particular call-stack is complete?

helfer commented 7 years ago

@nss213 @Poincare I'm fairly sure we could use setImmediate in that position as well instead of setTimeout(fn, 0). Could you try that @Poincare?

nss213 commented 7 years ago

Pretty sure setImmediate is nonstandard https://developer.mozilla.org/en-US/docs/Web/API/Window/setImmediate

helfer commented 7 years ago

Oh, I didn't realize. Then we'll have to look into whether we can run the result function without going through subscribe().

Poincare commented 7 years ago

I don't think the core issue is the result function since exactly the same issue will occur when we use watchQuery directly. I think a reasonable approach is to separately run the logic that has to do with QueryManager's list of queries and the logic that has to do with tearing down the subscription. This way, we can run remove the query from the QueryManager's list of queries before the return of that anonymous function, i.e. without the need for setTimeout.

To check whether this would work, I put in a delete _this.queryManager.stopQuery(_this.queryId); into the code just before the setTimeout and the Chrome benchmark times now look like this:

Chrome times

These are a bit higher than normal on average due to the extra logging I'm doing but the average magnitude of the times is still significantly smaller and the values don't increase over time. It also makes sense that the benchmarks in the Apollo Client repo didn't catch this, because V8 in the browser is handling setTimeout very differently from Node due to the fact that the browser has to concern itself with rendering, network requests, etc. So, given the evidence, I'm pretty confident this is the underlying issue.

abergenw commented 7 years ago

I'm currently experiencing performance problems as a result of running mutations. Typically, when I run a mutation the store gets updated 3 times; before the mutation is sent to the server (regardless of optimisticResponse or not), after the mutation response has been received and finally after refetchQueries response has been received.

This is problematic for two reasons. First, and most importantly, updating the store seems to be a slow process at least in my case. It doesn't matter if no data is actually changed (usually the first 2 updates explained above), the performance is still the same.

Second, Apollo doesn't seem to expose an API to minimize these store updates as a result of a mutation. In many cases what I actually want to do is to fire a mutation and if successful refetch all my queries (basically client.resetStore()) and only after this perform one single update of the store.

This being said, here's a CPU profile clarifying my point. I fire a mutation and as a result you can see 3 updates to the store (the 3 CPU spikes), the third one being highlighted. They are all rather slow; the first two ones ~100ms even though they don't cause any store data to actually change and the third one (shown in the image) ~180ms (refetchQueries, causing data to actually change). This is all caused by apollo as I'm preventing my react components from actually updating in this example.

My normalized cache contains ~700 objects most of which denormalized are nested at least a few levels.

Any suggestions anyone?

pasted image at 2017_04_18 09_44 pm

Poincare commented 7 years ago

@abergenw Thanks for the issue details. As mentioned on Slack (reproduced here in case anyone else can help as well), it would be fantastic to get a reproduction of this issue through either the benchmarks or through something like @nss213 's apollo-perf-test. The former is preferred though the latter will probably be needed since the benchmarks don't run inside a browser and some of these seem to be browser-specific issues.

abergenw commented 7 years ago

Thanks @Poincare. I'm studying the code trying to comprehend how apollo works and whether it is a good fit for my use case at all (large amounts of data). I can set up something that reproduces the profile posted above but this really seems to be something that happens as soon the amount of data and queries grow (culprit being diffQueryAgainstStore and graphql-anywhere). To make things worse, in react-apollo when a component with a query is unmounted the query doesn't seem to get removed from QueryManager as a result of the recycler.

Poincare commented 7 years ago

@abergenw Hm. The query observer should certainly be removed from QueryManager once the component is unmounted. If you could create a reproduction of just that, that would be very helpful.

helfer commented 7 years ago

@Poincare unfortunately the recycler in react-apollo makes it such that that's not the case. I think we need to take steps to remove it entirely and move to a the long term solid cache update strategy that we sketched out, which doesn't require it.

Poincare commented 7 years ago

Ah. That may be the cause of quite a few performance problems since the broadcastQueries is O(n) in the number of observers within the observer list. Could you link the cache update strategy discussion/issue?

helfer commented 7 years ago

I don't have a link handy right now, since many of the discussions happened in person, but the gist of it is that we want everyone to migrate from updates using named queries to updates using query documents and variables. That way queries no longer have to be kept around.

An alternative temporary solution (which I'm a bit wary of) would be to somehow mark the queries that are currently being recycled and skip them in broadcastQueries. We could do this by adding another fetchPolicy;do-not-execute.

Poincare commented 7 years ago

Got it.

To me, the "recycle this" marker idea sounds pretty reasonable - it can replace what we're currently doing in the result function as of this PR as well as knock out some other perf issues. I have a feeling this issue is affecting quite a few people so having a temporary fix is likely better than waiting for a full fix/change in application-level code. Thoughts?

helfer commented 7 years ago

@Poincare We just landed a small improvement that doesn't watch recycled queries any more (standby fetchPolicy).

abergenw commented 7 years ago

@helfer @Poincare I've been experimenting with the code trying to achieve an acceptable level of performance for my use case (quite a lot of data returned by queries). The results are quite positive and I might soon be ready to make a PR out of this. However, I'm not sure you want to incorporate all the modifications I've done since some of them are pretty case specific.

The main changes are:

A new query cache. Every query has an associated "cache" containing the denormalized response. This avoids a lot of calls to diffQueryAgainstStore which is slow when reading large sets of data. The cache for each query also contains the contained data ids; when a write to the store occurs changing data for any of these ids the query cache is marked dirty which leads to a reassembly next time diffQueryAgainstStoreis called (unless the query gets refetched before updating the query cache with the new response).
More efficient writes to store. Some of the queries I'm dealing with return a large amount of copies of the same objects. Consider a query fetching a list of projects and a list of users for each project projects { users { ... } }. If many of the projects contain the same users the current code performing the writes does a lot of unnecessary recursion. This gets a lot worse if the nesting goes deeper. This optimization allowed me to skip rewriting of ~50k objects (+ the associated fields!). The drawback here is that multiple occurences of the same object in a query response are not referentially equal but this is not an issue for me (not sure now if this even is the case with the current code?)
Additional mutation parameters. A pattern I'm using frequently (in other programming environments as well) is to clear the whole cache when a change occurs. This is basically what refetchQueries does but it requires me to list the names of the queries instead of allowing me to jsut refetch them all (if they are being watched). When calling a mutation I've added API support for basically running resetStore either optimistically (right after the mutation is sent to the server) or when the mutation response arrives successfully.
Mutation hook allowing access to the query cache. Using updateQueries and/or update with a mutation may result in performance problems as it will ultimately lead to calls to diffQueryAgainstStore. I solve this by allowing straight manipulation of the query cache associated with a query. If manipulation is performed the normalized cache is cleared (as we cannot really know how this should be affected by the manipulation) leading to queries being refetched but only when they are "re-watched" (with react-apollo this happens when a component is mounted). For me this is ok, though, since normally I'd be calling resetStore - refetches are not a problem.
Recycled queries no longer watched (probably what @helfer is referring to)

I may be doing some other improvements as well since I'm not completely happy with the performance yet even though it's a lot better than before. Writing the store is still a bottleneck that I think has room for improvements.

What do you think, are these changes something you'd want to review in a PR?

wmertens commented 7 years ago

If many of the projects contain the same users the current code performing the writes does a lot of unnecessary recursion. This gets a lot worse if the nesting goes deeper. This optimization allowed me to skip rewriting of ~50k objects (+ the associated fields!).

My approach in this situation is to query the userIds only and request the full user objects separately, in e.g. an Avatar component. Depends on the amount of data and duplication of course.

The drawback here is that multiple occurences of the same object in a query response are not referentially equal but this is not an issue for me

The approach I proposed in #1300 does do that, by making query cache writes build up the result object from cached objects, and then reads are O(1). Any time a query result comes in it would mutate all relevant cache objects and then it has to notify the active queries that changed.

Unfortunately that also means that the data is deeply mutated, which does not play nicely with React pure components.

helfer commented 7 years ago

Hey @abergenw, that sounds awesome! :clap:

Some of the optimizations sound like they would break functionality that other people rely on, but I think the denormalized query cache and extra mutation hooks sound like we could integrate them quite smoothly! I'd love to have each of those as a separate PR. The smaller the PR, the faster we can review it, and the greater the chance that we can merge it!

For the other things, I think there's also a chance to make them work, but we'd have to look at them carefully to see if we can minimize the impact it will have on current use-cases.

Really excited to see those PRs! 👍

I'm sure @Poincare will have some thoughts as well.

Poincare commented 7 years ago

Hey. Sorry for my late response on this - was caught up with finals. These changes sound great @abergenw. I'm particularly excited by the cache associated with the query since it probably eliminates a lot of calls to diffQueryAgainstStore given how most applications use the query cache currently.

I think there's also a lot of work to be done around managing how state is held with QueryManager. There appear to still be several cases in which query listeners seem to pile up with the QueryManager and result in unnecessary reads from the store. I'll post more details here once I have something more concrete.

stevewillard commented 7 years ago

@abergenw I would be curious to see your changes. I am experiencing similar issues with large result sets (5k-10k items in an array). The most time consuming time is spent writing/diffing/reading from the store.

abergenw commented 7 years ago

@stevewillard you can find my performance related improvements here:

https://github.com/abergenw/apollo-client/tree/performance-improvements

Most of the improved performance is a result of the query cache. More on this in #1673.

stevewillard commented 7 years ago

@abergenw BTW I tried your branch and it noticeably faster in my app (3x-5x faster for large arrays). Nice work! I'll be following your PR.

nevir commented 7 years ago

Slight side track, but fundamentally it seems like Apollo is going to hit a wall with the way it juggles normalized vs materialized nodes.

Even with this (second) query cache, there is a significant initialization cost per query (e.g. diffQueryAgainstStore and friends being called the first time). If you have, say, a mobile app that retrieves several large subgraphs when mounting a scene/route, it's very difficult to avoid jank during the initial read.

Anecdotally, our mobile app needs to suck in a heavily nested initial set of data (approx 20kb when serialized as JSON). Apollo spends 500ms+ performing initial reads on a Pixel across ~10 (watch) queries that have some overlap. Sorry, I don't have better numbers or examples yet; still working on instrumenting it.

From what I can tell, the core of problem it is that Apollo (and Relay) both err on the side of correctness: each query must return exactly what was requested - which prevents them from sharing a lot of work between common subgraphs. E.g. if Apollo has all the data available to satisfy a query, it still has to materialize a new result from the normalized store, every single time.

An alternative approach that may be worth considering is to loosen the correctness constraint a bit: return results that satisfy a query, but may also contain other un-requested properties. E.g. if there are two queries: { foo }, and { bar }, both could potentially be given the same object, with both foo and bar populated.

This would allow Apollo to have to keep only two copies of each node: once for its normalized form, and another for the materialized form. At that point, you effectively have an identity map of every node in the graph, and it's very quick to return results from the store. It gets slightly more tricky when you're talking about nodes with parameterized queries in them, but prototype chains can help there.

There is a risk of developers accidentally relying on properties they didn't request, to be sure. But, it seems like the trade off may be worth it - and could be potentially mitigated by a dev-mode strict mode much like how Apollo operates today.

wmertens commented 7 years ago

@nevir could not agree more, see also some musings on the subject at https://github.com/apollographql/apollo-client/issues/1300#issuecomment-283374347

helfer commented 7 years ago

@nevir with the new store we're working on you'll be able to make these kinds of tradeoffs yourself. If consistency isn't too important, you can plug in a store that simply caches the whole query and returns it when exactly that query is requested. From there one can add more optimizations like working out consistency in the background if desired. Happy to involve you in the design, if you're interested.

nevir commented 7 years ago

@helfer yeah, would definitely be curious about it - do you have any docs/thoughts written up on it?

nss213 commented 7 years ago

In the meantime, has anyone come up with some good performance workarounds for "I want to display a filterable list of medium sized objects"? Ideally something that doesn't involve changing the schema (e.g. making the data structure an opaque json blob).

koenpunt commented 7 years ago

Would making use of requestIdleCallback help? I'm currently using that to defer mutations during navigating in a React Native app, which works quite well.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale becuase it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to Apollo Client!

apollographql / apollo-client

Performance Umbrella Issue #1409

No-fetch query vs diffQueryAgainstStore

REC

Tasks