apollographql / apollo-kotlin

:rocket:  A strongly-typed, caching GraphQL client for the JVM, Android, and Kotlin multiplatform.
https://www.apollographql.com/docs/kotlin
MIT License
3.77k stars 656 forks source link

Apollo Store lock contention #4605

Open ynkm169 opened 1 year ago

ynkm169 commented 1 year ago

Use case

Hello,

Did some systracing. Launched 50 duplicate API calls, result shows there are around 60 RxCachedThreadS and 60 DefaultDispatch threads all contending for monitor lock.

This is only theoretical tests. However, our app at peak (home feed) can have 200+ threads running. Even if there will be only dozen of remote API calls (not counting other Apollo store cache reads), if we migrate all APIs to GraphQL, this could present a performance penalty.

Also in RXJava world, it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list, so 50 API calls is not so uncommon and it gets worse as user scroll quickly on a list.

Describe the solution you'd like

Question Is it possible to optimize the locking a bit more? Possibly even sacrificing consistency for performance (or having it as a configurable option)? MemoryCache has a monitor lock: CacheLock. (This theoretically should be very fast since synchronized block guards just a few LinkedHashMap calls) DefaultApolloStore has a ReentrantReadWriteLock. (A single writer will block all readers, a single reader will block all writers)


Java/Kotlin method tracing DefaultDispatch: It seems the ReentrantReadWriteLock does a couple of things such as Invoking JsonWriter, not just directly save data to a LinkedHashMap.

Screen Shot 2023-01-04 at 10 48 21 PM

RxCachedThreadS: It's harder to understand the Flame Chart and why the thread exist.

Screen Shot 2023-01-04 at 10 52 07 PM

systrace

Screen Shot 2023-01-04 at 9 33 42 PM Screen Shot 2023-01-04 at 9 32 07 PM Screen Shot 2023-01-04 at 9 32 01 PM Screen Shot 2023-01-04 at 9 31 55 PM Screen Shot 2023-01-04 at 9 31 49 PM Screen Shot 2023-01-04 at 9 31 44 PM Screen Shot 2023-01-04 at 9 31 38 PM Screen Shot 2023-01-04 at 9 31 29 PM
BoD commented 1 year ago

Thank you for providing this, this is interesting data, and will dig into it.

Just a quick reaction to

it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list

Not sure if you are using GraphQL already (if yes please disregard my comment), but one of its advantage when compared to REST is the ability to avoid this, by having the fields you need at your disposal to be used in one query. Of course the ability to do it will depend on the server's schema, but in many cases you should be able to avoid some round trips.

ynkm169 commented 1 year ago

Not sure if you are using GraphQL already

Yes, to provide more context, REST and GraphQL will co-exist for quite a while for us. Migration will take a lot of time. It wasn't best argument :) since GraphQL was designed to reduce API calls.

ynkm169 commented 1 year ago

I wasn't sure if it is possible to build a LRU cache that's more performant than a synchronized LinkedHashMap. Looks like Guava cache is potentially more performant (hard to tell since Guava cache maybe better used for server with hundreds/thousands of threads than low power mobile devices.)

The Least Recently Used page replacement algorithm was chosen due to its simplicity. The basic strategy is to subdivide the table among Segments, each of which itself is a concurrently readable hash table. The map supports non-blocking reads and concurrent writes across different segments.

This is Android LRU cache implementation.

Anyway, I think the CacheLock should be sufficient and not cause of the contention. But just some thoughts, so ignore this comment.

BoD commented 1 year ago

A few additional notes:

ynkm169 commented 1 year ago

But inside the same synchronized block the code also reads the data of the next cache. This is usually the disk cache and does IO, so not a fast operation.

Great idea! What's the class name of the disk cache? If you are referring to http cache, we don't use it.

BoD commented 1 year ago

No sorry I was referring to the SQLite cache (SqlNormalizedCache). Maybe you don't need it either actually - it's only useful if you need a persistent cache in your app of course.

ynkm169 commented 1 year ago

We don’t use sql cache at least for now (may do in future to persist user data such as making a post and save). So that lock contention is caused by something else.

Sent from my iPhone

On Jan 5, 2023, at 1:55 PM, Benoit Lubek @.***> wrote:



No sorry I was referring to the SQLite cachehttps://www.apollographql.com/docs/kotlin/caching/normalized-cache/#sqlite-cache (SqlNormalizedCachehttps://github.com/apollographql/apollo-kotlin/blob/05f2a3295517fb7a64c2259c38b453a37f3c4d19/libraries/apollo-normalized-cache-sqlite/src/commonMain/kotlin/com/apollographql/apollo3/cache/normalized/sql/SqlNormalizedCache.kt#L13). Maybe you don't need it either actually - it's only useful if you need a persistent cache in your app of course.

— Reply to this email directly, view it on GitHubhttps://github.com/apollographql/apollo-kotlin/issues/4605#issuecomment-1372608556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANZ66UYE7AV6FABZYWTS2DWQ4KK7ANCNFSM6AAAAAATRLN4LU. You are receiving this because you authored the thread.Message ID: @.***>

BoD commented 1 year ago

Thanks for clarifying. It's difficult to pinpoint the source of the contention. FWIW, what I observe in a similar test - albeit without Rx - looks a bit different.

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

Overall, we think the normalized cache could benefit from a revamp, including performance improvements (in this area but also in terms of storage size, access time, etc.), as well as new features (see #2331) - but this will be a major task.

ynkm169 commented 1 year ago

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

I am not familiar with internals of everything the lock currently guards.

Regarding the our own repository layer implementation for comparison with Apollo Store. It basically concats 3 streams of data and optionally take the first that comes back. The result observable can be subscribed on IO thread. We basically do not hold any lock anywhere. Android LRU cache implementation is thread safe. A: in memory data source with Android LRU cache implementation. B: DB data source (very rare) C: remote REST data source.

Let me know if there is any performance improvement regarding this in future, I could help do some systrace! Thanks.

ynkm169 commented 1 year ago

Hey, just sharing some ideas I found for improving performance of readers/writers JakeW's DiskLruCache used by Okhttp and Bumptech/Glide image loading.

ashare80 commented 1 year ago

Running into performance issues with this with large queries.

Found that there was some extra logic inside of the locks that could be moved out: https://github.com/apollographql/apollo-kotlin/pull/5101

This was minor performance boost when testing these modifications on our data.

It appears on a query cache read I am trying to boost performance, a lot of the time is spent in JSON serialization logic trying to build out the records.

Screenshot 2023-07-18 at 1 50 58 AM
martinbonnin commented 1 year ago

Thank you so much @ashare80 💙 . The investigation and PR are awesome! Is there any chance you could contribute your data? We could include that in benchmarks and that'd be a good data point for future optimisation. No worries at all if not, we can always forge tests but real life data always has more flavour :)

ashare80 commented 1 year ago

@martinbonnin We might be able to work something out for that. What would be the best way to share this? Would a sample response be sufficient?

martinbonnin commented 1 year ago

Nice! If you can share the schema + query too, it'd be ideal. If not we can craft something out but it's a bit painful + might not be able to infer the polymorphism if any.