Apollo Store lock contention

ynkm169 commented 1 year ago

Use case

Hello,

Did some systracing. Launched 50 duplicate API calls, result shows there are around 60 RxCachedThreadS and 60 DefaultDispatch threads all contending for monitor lock.

This is only theoretical tests. However, our app at peak (home feed) can have 200+ threads running. Even if there will be only dozen of remote API calls (not counting other Apollo store cache reads), if we migrate all APIs to GraphQL, this could present a performance penalty.

Also in RXJava world, it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list, so 50 API calls is not so uncommon and it gets worse as user scroll quickly on a list.

Describe the solution you'd like

Question Is it possible to optimize the locking a bit more? Possibly even sacrificing consistency for performance (or having it as a configurable option)? MemoryCache has a monitor lock: CacheLock. (This theoretically should be very fast since synchronized block guards just a few LinkedHashMap calls) DefaultApolloStore has a ReentrantReadWriteLock. (A single writer will block all readers, a single reader will block all writers)

Java/Kotlin method tracing DefaultDispatch: It seems the ReentrantReadWriteLock does a couple of things such as Invoking JsonWriter, not just directly save data to a LinkedHashMap.

RxCachedThreadS: It's harder to understand the Flame Chart and why the thread exist.

systrace

BoD commented 1 year ago

Thank you for providing this, this is interesting data, and will dig into it.

Just a quick reaction to

it is not uncommon for dev to call an API to retrieve a list of items, and use flatmap to make a seperate API call for each item in the list

Not sure if you are using GraphQL already (if yes please disregard my comment), but one of its advantage when compared to REST is the ability to avoid this, by having the fields you need at your disposal to be used in one query. Of course the ability to do it will depend on the server's schema, but in many cases you should be able to avoid some round trips.

ynkm169 commented 1 year ago

Not sure if you are using GraphQL already

Yes, to provide more context, REST and GraphQL will co-exist for quite a while for us. Migration will take a lot of time. It wasn't best argument :) since GraphQL was designed to reduce API calls.

ynkm169 commented 1 year ago

I wasn't sure if it is possible to build a LRU cache that's more performant than a synchronized LinkedHashMap. Looks like Guava cache is potentially more performant (hard to tell since Guava cache maybe better used for server with hundreds/thousands of threads than low power mobile devices.)

The Least Recently Used page replacement algorithm was chosen due to its simplicity. The basic strategy is to subdivide the table among Segments, each of which itself is a concurrently readable hash table. The map supports non-blocking reads and concurrent writes across different segments.

This is Android LRU cache implementation.

Anyway, I think the CacheLock should be sufficient and not cause of the contention. But just some thoughts, so ignore this comment.

BoD commented 1 year ago

A few additional notes:

About your last comment about LRU, this is interesting. My hunch is that optimizing the LRU cache itself won't have a big impact (I could be wrong!) but that the locking situation showed with your traces could be improved
The need for the synchronized block in MemoryCache.loadRecord() is that, although it is already called under a read lock (from DefaultApolloStore.readOperation), the code actually alters the inner state, and we need to make sure that only one thread is doing that at a time. As you noted above, updating the LRU cache state itself should be a very fast operation however.
But inside the same synchronized block the code also reads the data of the next cache. This is usually the disk cache and does IO, so not a fast operation. Locking this part is also unnecessary since we're already under the read lock. We could improve this by reading the disk cache outside of the synchronized block, and only do the memory changes under synchronized block. This would allow multiple threads to read from the disk cache at the same time, as intended, probably a good improvement to make.

ynkm169 commented 1 year ago

But inside the same synchronized block the code also reads the data of the next cache. This is usually the disk cache and does IO, so not a fast operation.

Great idea! What's the class name of the disk cache? If you are referring to http cache, we don't use it.

BoD commented 1 year ago

No sorry I was referring to the SQLite cache (SqlNormalizedCache). Maybe you don't need it either actually - it's only useful if you need a persistent cache in your app of course.

ynkm169 commented 1 year ago

We don’t use sql cache at least for now (may do in future to persist user data such as making a post and save). So that lock contention is caused by something else.

Sent from my iPhone

On Jan 5, 2023, at 1:55 PM, Benoit Lubek @.***> wrote:

No sorry I was referring to the SQLite cachehttps://www.apollographql.com/docs/kotlin/caching/normalized-cache/#sqlite-cache (SqlNormalizedCachehttps://github.com/apollographql/apollo-kotlin/blob/05f2a3295517fb7a64c2259c38b453a37f3c4d19/libraries/apollo-normalized-cache-sqlite/src/commonMain/kotlin/com/apollographql/apollo3/cache/normalized/sql/SqlNormalizedCache.kt#L13). Maybe you don't need it either actually - it's only useful if you need a persistent cache in your app of course.

— Reply to this email directly, view it on GitHubhttps://github.com/apollographql/apollo-kotlin/issues/4605#issuecomment-1372608556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AANZ66UYE7AV6FABZYWTS2DWQ4KK7ANCNFSM6AAAAAATRLN4LU. You are receiving this because you authored the thread.Message ID: @.***>

BoD commented 1 year ago

Thanks for clarifying. It's difficult to pinpoint the source of the contention. FWIW, what I observe in a similar test - albeit without Rx - looks a bit different.

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

Overall, we think the normalized cache could benefit from a revamp, including performance improvements (in this area but also in terms of storage size, access time, etc.), as well as new features (see #2331) - but this will be a major task.

ynkm169 commented 1 year ago

I'm wondering if it would make sense to get rid of a high level lock on ApolloStore and let the NormalizedCaches handle their own locking instead.

I am not familiar with internals of everything the lock currently guards.

Regarding the our own repository layer implementation for comparison with Apollo Store. It basically concats 3 streams of data and optionally take the first that comes back. The result observable can be subscribed on IO thread. We basically do not hold any lock anywhere. Android LRU cache implementation is thread safe. A: in memory data source with Android LRU cache implementation. B: DB data source (very rare) C: remote REST data source.

Let me know if there is any performance improvement regarding this in future, I could help do some systrace! Thanks.

ynkm169 commented 1 year ago

Hey, just sharing some ideas I found for improving performance of readers/writers JakeW's DiskLruCache used by Okhttp and Bumptech/Glide image loading.

Multiple readers can read the cache even if there is a writer that is currently writing. Writer write to a temporary file, and swap with the main file when it's done. It sacrifice consistency for performance.
Does not allow multiple writers to the same file.

ashare80 commented 1 year ago

Running into performance issues with this with large queries.

Found that there was some extra logic inside of the locks that could be moved out: https://github.com/apollographql/apollo-kotlin/pull/5101

This was minor performance boost when testing these modifications on our data.

It appears on a query cache read I am trying to boost performance, a lot of the time is spent in JSON serialization logic trying to build out the records.

martinbonnin commented 1 year ago

Thank you so much @ashare80 💙 . The investigation and PR are awesome! Is there any chance you could contribute your data? We could include that in benchmarks and that'd be a good data point for future optimisation. No worries at all if not, we can always forge tests but real life data always has more flavour :)

ashare80 commented 1 year ago

@martinbonnin We might be able to work something out for that. What would be the best way to share this? Would a sample response be sufficient?

martinbonnin commented 1 year ago

Nice! If you can share the schema + query too, it'd be ideal. If not we can craft something out but it's a bit painful + might not be able to infer the polymorphism if any.

apollographql / apollo-kotlin

Apollo Store lock contention #4605

Use case

Describe the solution you'd like