Missing documentation around cache size limits before 3.11

CamJohnson26 commented 1 month ago

Hi there, we upgraded to 3.11 in prod and had to revert since users were hitting extremely slow performance when making large queries. On investigation, users are exceeding the 50,000 cache limit on executeSelectionSet or the 10,000 limit on executeSubSelectedArray.

To fix the issue and reupgrade we're planning to increase the cacheLimit to what it was before 3.11, but can't find this information in the upgrade guide. From https://github.com/benjamn/optimism/blob/main/src/index.ts#L142 and https://github.com/apollographql/apollo-client/pull/8107/files#diff-aba857f65aabe3dce87a57f153d5cc33fe292065af0dcc6c1742b116b226893aR111 we believe it was set to 2**16 previously, but would love it if this can be verified.

Thanks so much, and thanks for these new cache configuration parameters, they'll be huge for us.

jerelmiller commented 1 month ago

Hey @CamJohnson26 👋

I believe what you're looking for is this doc: https://www.apollographql.com/docs/react/caching/memory-management. This was something added in 3.9.0. Are you perhaps upgrading from a version before that? I don't believe we've touched these values since we released 3.9.

If you're also looking for a bit more in-depth dive on it, check out Lenz's blog post on memory management in 3.9: https://www.apollographql.com/blog/apollo-3-9-beta-feature-spotlight-the-memory-story

Regardless, let me know if that helps!

CamJohnson26 commented 1 month ago

We also believe that our cache limits were too generous, so we revisited every internal memoization cache, double-checked our key usage, replaced all WeakMaps with a new Weak LRU Cache implementation, and made sure that each cache has a more reasonable maximum size that better represents the results it stores.

While those defaults should be sufficient in 90% of applications, we know that some applications will need to make different trade-offs, so we made each cache size individually configurable. You can read up on all the new configuration options in this new documentation page on memory management.

This is the bit I think it would be useful to clarify, what those previous generous cache limits were. I think they were 2**16 but would be good if you can verify that. Pretty sure it was using the optimism library's built in default.

And yes exactly, we're all the way back on 3.6

phryneas commented 1 month ago

Yes, it was the optimism default of 2^16 before.

We had that blog article and an RC out for quite a while and were hoping to get some usage numbers from the community, but unfortunately didn't get a lot of feedback. So we went out and made measurements ourselves to find as many "heavy using" pages as we could, and came up with the current defaults, which already included a big "extra" on top of our measurements.

I'm sorry that this is causing you pain now - we had to make that call between "memory leaks for the majority vs very few outliers to our measurements" 😞

CamJohnson26 commented 1 month ago

Thanks for the context. If it helps we wrote a quick and dirty python script to estimate cache key usage from the response json. Our largest query based on this was using 900,000 cache keys, well above the default, and it seems like Apollo freezes up if an entire query can't fit in the cache. We have other large queries that are consistently 35,000 cache keys, so when we upgraded past 3.9 some users started experiencing page lockups. We've set the largest queries to no-cache for now since they are rarely updated.

Long term and for anyone else running into this issue I think we can significantly increase our limit without major downsides, but understandably that's a risky thing to do since we don't want our end users suddenly using a lot more memory and slowing down their machines. I have a theory that the garbage collector is cleaning up some cache keys when we navigate to a page that doesn't call the query, which if true further derisks increasing the cache limit.

import json

def count_leaf_objects(data):
    """
    Recursively count the number of leaf objects in the JSON data.

    :param data: The JSON data (can be a dict, list, etc.).
    :return: The number of leaf objects.
    """
    if isinstance(data, dict):
        if not data:
            return 1
        # Check if all values are leaf nodes
        adder = 1 if len([value for value in data.values() if isinstance(value, dict) or isinstance(value, list)]) == 0 else 0
        return adder + sum(count_leaf_objects(value) for value in data.values())
    elif isinstance(data, list):
        if not data:
            return 1
        # Recursively count leaf nodes in each item of the list
        adder = 1 if len([value for value in data if isinstance(value, dict) or isinstance(value, list)]) == 0 else 0

        return adder + sum(count_leaf_objects(item) for item in data)
    else:
        # Not a dict or list, hence it is a leaf node
        return 0

def main():
    # Load JSON data from file
    input_file = 'out.json'
    with open(input_file, 'r') as file:
        json_data = json.load(file)

    # Count leaf objects
    leaf_count = count_leaf_objects(json_data)

    # Print the result
    print(f"Number of leaf objects: {leaf_count}")

if __name__ == "__main__":
    main()

Narretz commented 2 weeks ago

I just ran into this performance regression (or whatever you want to call it), and even if I crank up every cache config value to the previous limits (as far as I can make them out), I still don't always get the < 3.9 performance.

Now there where other cache changes in 3.9, specifically a lot of WeakCache usages, so that might be the problem?

phryneas commented 2 weeks ago

@Narretz could you try to create profiles of the prior and new behaviour? We might be able to spot what is running out of bounds for you from that.

ariel-upstream commented 4 days ago

I also experience performance slowdowns and page freezes with newer Apollo Client versions (currently using 3.11.8)

phryneas commented 4 days ago

@ariel-upstream have you tried the steps laid out in https://www.apollographql.com/docs/react/caching/memory-management#measuring-cache-usage to measure if you are hitting caching limits?

ariel-upstream commented 4 days ago

Not yet, but in my case I'm not sure it's about the cache limit I think it's something else with the new version

apollographql / apollo-client

Missing documentation around cache size limits before 3.11 #12068