Have `cache.modify` functions receive `options.args`

LiesbethW commented 4 years ago

It would be very helpful if cache.modify received options.args, similar to what read and merge functions do.

A current case that we come across, is that we have large lists of items in our cache, which are under the same field name but queried with a status argument/variable. When creating a new item, we currently query not only the resulting new item, but the lists as well. To increase speed and reduce over-fetching, we changed the mutation to only query the newly created object itself, and manually add it to several lists using cache.modify. However, the new item should only be appended to the field values which correspond to certain arguments. To do this, we now rely on parsing of storeFieldName strings: a much preferred solution would be to directly work with the arguments, but for that they would have to be provided through options.args.

Since @benjamn extensively described some of the practical implications in a previous comment, I've included this:

@jedwards1211 I would like to see cache.modify functions receive options.args, somewhat like read and merge functions currently do.

For cache.modify specifically, I'm imagining options.args would be the most recent arguments object (possibly null) from the last time the field's value was updated, rather than anything more complicated, like a history of arguments or variables.

Implementing options.args would be somewhat tricky right now (which mostly means we are unlikely to get to it before AC3 is released), because field arguments are not currently saved (or reversibly encoded) in the cache's internal representation. However, I don't see any immediate problems (other than slightly increasing cache size) with the EntityStore saving the original, unfiltered arguments alongside their corresponding field values somehow, so the arguments could be recovered without any parsing of storeFieldName strings.

Historically, storing the original arguments alongside field values has not been necessary for reading and writing queries or fragments, because the given query/fragment plus variables allows computing specific, unambiguous arguments for any field. What you've sketched so far with cache.updateFragment in #6335 seems like it follows this unambiguous path, too, leaving cache.modify and cache.evict as the only two APIs that modify the cache without the guidance of a query/fragment plus variables, potentially acting on multiple field values per field name, which might have been written using different arguments.

Both modify and evict are new in AC3, so I'd like to give them some time to bake, so we can see how people (want to) use them. That said, I'm pretty sure the options.args idea is technically feasible, without a major @apollo/client version bump.

In the meantime, please feel free to share any compelling use cases that options.args (or something similar) could solve.

Originally posted by @benjamn in https://github.com/apollographql/apollo-client/pull/6289#issuecomment-634316625

benjamn commented 4 years ago

Here's my latest thinking (a workaround that works today, and avoids the need to store unused args for every single field in the cache): https://github.com/apollographql/apollo-client/issues/6394#issuecomment-656193666

jedwards1211 commented 4 years ago

Why such an aversion to storing args though? They won't take up much space compared to data as far as I can imagine. That workaround takes extra trouble to store the args ourselves, and feels like a duct tape solution instead of something systematic and elegant. I'd really rather the cache store args automatically in case we need to use them

benjamn commented 4 years ago

@jedwards1211 Since I first wrote the comment that you quoted above (in support of options.args), I have changed my mind about the coherence of that proposal, because the arguments that happened to be used the last time the field was written are just that—arguments that describe the last chunk of data merged into the field.

If you're paginating the field, for example, the last arguments (say, args.cursor and args.limit) only describe the most recent page received, which is not actually that useful for cache.modify, since it operates on the entire field value, representing all data received/merged so far. What you really want are options.args.{cursor,limit} that specify the beginning cursor for the whole list, and perhaps a limit reflecting the current total length. Unfortunately, this information is not something the cache can provide based on any of the args objects it has previously seen, so the application developer needs to specify it somehow. A field policy like the one I recommended in https://github.com/apollographql/apollo-client/issues/6394#issuecomment-656193666 is one way to do that, because you get to make all the decisions about how the arguments are stored.

jedwards1211 commented 4 years ago

Actually I made a mistake in my last comment suggesting that I would need page range from args. Relay style pagination returns the page range in query data and my Apollo 2 updaters already have to update the page range in cached query data accordingly. It's search and sort order I would want from args for handling creates and deletes, and when those change I have to throw away the cached list and start from scratch. But maybe I should echo the search and sort args in the query results instead of only the page info relay prescribes...

So basically all list fields in my app are either relay-style paginated or not paginated at all, hence all args besides the relay pagination parameters (which are stored in data anyway) remain relevant. So it would be super handy to automatically have those last args.

I guess for people who don't want to use relay style pagination... Simpler pagination schemes seem like an uphill battle (for example if you need to paginate backwards if the user scrolls too far down and you have to evict the head of the list, you're hosed if you just use cursor/limit pagination)

benjamn commented 4 years ago

Yes, many simpler pagination schemes sacrifice the ability to paginate backwards, and might skip elements if the list is modified between page requests. Relay-style pagination attempts to solve these problems with connections and edges and cursors, but it comes at a steep cost: if you're using Relay, every single paginated field must follow their pagination specification, even if you don't need the extra complexity in some cases, or you need additional/different functionality not present in that specification.

In AC3, we've been able to abstract away the details of Relay-style pagination with the relayStylePagination (#6465) field policy helper function, without baking any Relay-specific knowledge into Apollo Client itself, but we've done that by moving the (very real) complexity into the merge and read functions. If you're using cache.modify to access that data manually, you're going to be exposed to some implementation details that leak through the abstraction provided by relayStylePagination.

Whether you're updating data manually or using read and merge functions to maintain it, I recommend storing paginated data in a non-sorted, non-filtered format internally. That way your read function can use whatever options.args it receives to perform the sorting or filtering at read time, and cache.modify will get the complete/unfiltered/unordered data. You shouldn't have to throw any data away (start from scratch) just because your sorting/searching arguments changed.

jedwards1211 commented 4 years ago

That's cool, I'm looking forward to using the relayStylePagination policy!

You shouldn't have to throw any data away (start from scratch) just because your sorting/searching arguments changed.

Imagine you fetch the first 10 users (out of millions) sorted by name, and those users just happen to have ages of 12, 17, 23, 35, 41, 58, etc. Then you switch to sorting by age. Tons of users go between each of those when sorting by age, and you can't know a priori where the gaps are, so you have to either fetch the first 10 users by age, or pick one of those cached ages as your start cursor. And then you'll end up refetching the old cached users anyway as you scroll.

jedwards1211 commented 4 years ago

I see the point about the last args potentially misleading people depending on the use case...I guess I wish I could define a blanket policy that acts on all fields to stash the last args, instead of having to do it for each list field individually

raysuelzer commented 3 years ago

This is an issue for far more than pagination, it impacts invalidation and modification on ROOT_QUERIES with keyArgs. The fact that the details passed to Modifier<T> do not include variables, other than what is serialized in the storeFieldName makes modifying queries which share the same field name but have a large number of variables extremely difficult, the same is true with cache.evict. I think exposing stored keyArgs as part of details in the Modifier<T> function would be very helpful, as it is already is stored in the storeFieldName as a string. Unfortunately, to get those args into a JS object requires ugly hacks to parse the cacheKey to a javascript object.

For example, imagine a query which has 4-5 optional variables which are all used a keyargs. There may be hundred or more in the cache (think typeahead), with a huge number of potential combinations. If you add an item via a mutation, there is no easy way to search through the cache to determine which queries you need to modify because you have NO access to the keyargs. You literally would have to guess every possible combination and check if it exists in the cache.modify function.

I appologize for any errors in the code below. As I am typing it directly here.

query GetItems(
  $campaignId: ID
  $type: String  
  $keyword: String
  $targetType: String
) {
    getItems(
    campaignId: $campaignId
    type: $type
    keyword: $keyword
    targetType: $targetType
) {
        id
        name
        type
    }
}

Assume all of these variables are used as keyArgs. Every time a user updates keyword, it will create a new cache entry, or if they change the type, etc. Assume I add a new item of type "Foo" to campaign id "1". That that means I need to figure out which GetItems queries have a campaignId of "1" or a type of foo and modify them.

For example this update function using cache.modify, doesn't really leave me a great way to figure out which of the getItems on the root query I really care about.

            update: (cache, result) => {
                cache.modify({
                    id: 'ROOT_QUERY',
                    fields: {
                        getItems(value, details) {
                            // Returns every query, but no access to keyArgs
                            console.log(details.storeFieldName) // getFields{"type":"Bar", keyword: "f"}
                        }
                    }
                })

What would be better

            update: (cache, result) => {
                cache.modify({
                    id: 'ROOT_QUERY',
                    fields: {
                        getItems(value, details) {
                           // allow checking the keyArgs
                            if (details.keyArgs.type === result.type) {
                                 // do update on fields where the type matches the mutation result
                                 // the other args are irrelevant (keyword, campaign, etc)
                             }
                             // return other values, they don't need to be modified
                             return value
                        }
                    }
                })

This also extends to cache.evict.

I use this hacky helper function frequently, to determine which queries I need to evict. invalidateApolloCacheFor takes two arguments. The first, is the Apollo cache. The second is a function which receives the field and key args of every ROOT_QUERY item in the cache. If the filter returns true, the helper will evict from the cache.

The main "trick" here is the helper parses the cache key (which has the key args included) into an object containing the field name and key args. The filter function is then invoked with this new object. This helper could be removed if there was a way to get at the keyArgs for each field in the cache other than parsing the storeFieldName string.

            update: (cache, result) => {
               invalidateApolloCacheFor(cache, (field, keyArgs) => {
                 return field === 'getItems' && keyArgs.type === result.type
              });
}

export const invalidateApolloCacheFor = (
    cache: ApolloCache<any>,
    fieldAndArgsTest: FieldAndArgsTest) => {

    // Extracts all keys on the root query
    const rootQueryKeys = Object.keys(cache.extract().ROOT_QUERY);

    const itemsToEvict = rootQueryKeys
        .map(key => extractFieldNameAndArgs(key))
        .filter(r => fieldAndArgsTest(r));

    itemsToEvict.forEach(({ fieldName, args }) => {
        cache.evict(
            {
                id: 'ROOT_QUERY',
                fieldName,
                args
            }
        );
    });
};

export const extractFieldNameAndArgs = (key: string) => {
    if (!key.includes(':')) {
        return { fieldName: key, args: null };
    }
    const seperatorIndex = key.indexOf(':');
    const fieldName = key.slice(0, seperatorIndex);
    const args = convertKeyArgs(key);

    return { fieldName, args };
};

// Convert the keyArgs stored as a string in the query key
// to an object.
const convertKeyArgs = (key: string): Record<string, any> => {
    const seperatorIndex = key.indexOf(':');
    const keyArgs = key.slice(seperatorIndex + 1);

    // @connection directives wrap the keyArgs in ()
    // TODO: Remove when legacy @connection directives are removed
    const isLegacyArgs = keyArgs?.startsWith('(') && keyArgs.endsWith(')');

    const toParse = isLegacyArgs ? keyArgs.slice(1, keyArgs.length - 1) : keyArgs;

    // We should have a string here that can be parsed to JSON, or null
    // getSafe is an internal helper function that wraps a try catch
    const args = getSafe(() => JSON.parse(toParse), null);
    return args;
};

mgummelt commented 2 years ago

It would be great to have the solution that @raysuelzer outlined above. I have the same problem when updating the cache for parameterized fields. I'm surprised more Apollo users aren't complaining about this, since it should be an obstacle for GraphQL server with parameterized fields. My current workaround is to use refetchQueries everywhere to maintain cache consistency. Is that what most users do?

ayarmak commented 2 years ago

FYI in case helpful - there is actually a way to use cache.modify with parameterized fields today, which simply leverages the following two facts about it (you can check them both by doing console.log of the second argument of the modifier function):

The second argument of the modifier function (the function mapped inside your fields parameter in cache.modify) has a storeFieldName key which has the full name of the stored field including the parameters. (e.g., customers({"active":true,"region":"southwest"}))
On every cache.modify call, the modifier function is run for every of the parameterized values of that field that you have in your cache

So, pretty much, you simply need to return the original cached value when storeFieldName doesn't include the argument values you're looking to change, and return a modified value otherwise. E.g.:

cache.modify({
    id: cache.identify({ __typename: 'SalesManager', id }),
    fields: {
      customers(cached, { storeFieldName }) {
        if (!storeFieldName.includes('southwest')) return cached;
        return [...cached, newCustomer];
      },
    },
  });

Hope this helps.

akp2020 commented 2 years ago

Thanks in advance. My question is related to cache.modify--- Thanks @ayarmak for storeFieldName solution.

How can I modify 2 different fields inside one cache.modify ? one field is inside root_query which I am accessing without passing the ID as ID will be set to ROOT_QUERY by default.. Other field is outside root_query and I have to pass the ID 1st to access any fields inside it ..

taejs commented 1 year ago

It resolved after apollo 3.5 use updateQuery

bignimbus commented 1 year ago

Hi all, I'm doing some housekeeping and am curious as to whether you agree with @taejs's comment RE: version 3.5 using updateQuery. I'm happy to keep this issue open if there's actionable work for the client team. Thanks so much!

mgummelt commented 1 year ago

@taejs @bignimbus

No, updateQuery is not a sufficient replacement for modify. updateQuery calls the merge function in the field policy, whereas modify doesn't. We have to use modify to implement cache operations like deletion that must bypass the merge function.

bignimbus commented 1 year ago

Thanks for the quick response @mgummelt and thanks all for your patience! As mentioned above, please also see the open feature request: https://github.com/apollographql/apollo-feature-requests/issues/259. For transparency, the maintainers will not be prioritizing this item in the near future but we do want to keep this on our radar!

hect1c commented 1 year ago

Hey I had the same issue some time ago but implemented a solution which has helped me so far and seems to work well, so I'll post it here if it helps anyone. Note: the one caveat is that I need to be able to get my input args wherever I use this but that hasn't been an issue for me

I just hash the keyArgs in my type policies when I need it on specific fields

import hash from 'object-hash'

Query {
 fields: {
  usersGet: {
    keyArgs: hash,
   }
 }
}

Then in my cache modify whenever I need to use it I just rebuild the key and this allows me to get the data I need or perform operations specific to that dataset and return the structure accordingly

import hash from 'object-hash'

.....
// below is the main piece where I hash the input args which have been hashed in my type policy
// this allows me to then be able to get the exact ref I want to manipulate in my cache.modify
const hashKey = `usersGet:${hash({ input: { filter: { userIds: [userId] } } })`

cache.modify({
 fields: {
  user:  (ref) => {
   const usersGetCache = ref[hashKey]

   // do whatever manipulation you need

   return {
    ...ref,
    [hashKey]: {
     // return data 
    }
   }
  }
 }
})

We use namespaces (ie. user) but this should be enough for you to repurpose for your use cases. Hope this helps as I remember struggling on this for a lonnng time a year or so ago

marcorm commented 1 year ago

I ended up using @ayarmak solution (thx). How could it be done using updateQuery? I tried but i always get previousData == null.

ksi9302 commented 9 months ago

I'm still waiting for a more versatile and worry-free approach on this.. I may be greedy but it'll be a game changer for sure. Currently, for multiple complex parameterized queries, relying on subscription / refetching seems to be the better way to ensure maximum consistency of the data with (much) less effort.

For updateQuery approach, please see apollo forum discussion

The problem with updateQuery, or filtering by storeFieldName is that we'll have to modify the filtering logic / updateQuery variable frequently depending on the number of parameterized queries.

apollographql / apollo-client

Have `cache.modify` functions receive `options.args` #7129