Infinite loop when using a cache-and-network fetchPolicy

alinpetrusca commented 3 years ago

Intended outcome: I have a component that retrieves some data using useQuery and renders a chart. Until now I used these options:

fetchPolicy: 'cache-and-network',
nextFetchPolicy: 'cache-first'

But now I realized that I have some data that may change and I need it to be up-to-date. I still want to use the cache, so I think that cache-and-network fetchPolicy would be the best in this case.

Actual outcome: When I change the fetchPolicy to cache-and-network or network-only I end up having an infinite loop. This doesn't happen when I use no-cache.

This is how the component looks like:

export const GET_CHART_DATA = gql`
  query getChartData(
    $year: Int!
    $previousYear: Int!
    $currentYearStartDate: Date
    $currentYearEndDate: Date
    $previousYearStartDate: Date
    $previousYearEndDate: Date
    $product: String
    $channel: String
  ) {
    currentYearData: reportings(
      year: $year
      service: "OTHERS"
      date_Gte: $currentYearStartDate
      date_Lte: $currentYearEndDate
      productType_In: $product
      channelType_In: $channel
      dateType: "VISITS"
    ) {
      edges {
        node {
          date
          value
        }
      }
    }

    previousYearData: reportings(
      year: $previousYear
      service: "OTHERS"
      date_Gte: $previousYearStartDate
      date_Lte: $previousYearEndDate
      productType_In: $product
      channelType_In: $channel
      dateType: "VISITS"
    ) {
      edges {
        node {
          date
          value
        }
      }
    }

    budgetData(
      year: $year
      type: "OTHERS"
      date_Gte: $currentYearStartDate
      date_Lte: $currentYearEndDate
    ) {
      edges {
        node {
          date
          value
        }
      }
    }
  }
`;

const ChartContainer = ({ dateRangeVariables, filters }) => {
  const { loading, data, error } = useQuery(GET_CHART_DATA, {
    variables: {
      ...dateRangeVariables,
      product: filters.product.map(({ value }) => value).join(','),
      channel: filters.channel.map(({ value }) => value).join(',')
    },
    fetchPolicy: 'cache-and-network'
  });

  if (loading) {
    return <div>Loading</div>;
  }

  if (error) {
    return <div>Error</div>;
  }

  // structure data for chart...

  return <div>Chart</div>;
};

Versions @apollo/client 3.2.0

I assume this is a cache issue, but I'm not sure if it's my fault or not and I don't know how to fix it. Any ideas?

Thank you for your time!

Edit: I also think that defaultOptions is not working correctly. I have these options set when I create the apollo client instance:

defaultOptions: {
    query: {
      fetchPolicy: 'cache-and-network',
      nextFetchPolicy: 'cache-first'
    }
  }

First time when I render a component that queries some data I see the network request. If I switch to another view and then come back, I see no request (it's retrieved from the cache - cache-first?). Anyway, if I use fetchPolicy: 'cache-and-network' option inside useQuery I always see the request at component mount. Is this the expected behavior ?

benjamn commented 3 years ago

@Alin13 It looks like these data are paginated using a Relay-style connection/edges API. Since you didn't mention anything about your typePolicies or field policies, I'm going to recommend you read our new documentation about pagination before proceeding, because the two subqueries are almost certainly clobbering each other's data in the cache. You can either keep their data totally separate with keyArgs, or you can use a field policy to make sure their data ends up merged together in a logical way. I know that probably doesn't make sense immediately, but you'll find a full explanation in the docs.

A few side notes:

The defaultOptions.query configuration does not apply to useQuery, which uses client.watchQuery behind the scenes. You probably want defaultOptions.watchQuery instead.
It's risky to put nextFetchPolicy in defaultOptions, since it applies to all queries. If you want to do this safely, I strongly recommend using a function for nextFetchPolicy: #6893.

alinpetrusca commented 3 years ago

Well, I don't really need the pagination here, so I assume that if I remove the Relay structure it will work as expected.

I never used any type or field policies and it's a little hard to understand exactly what needs to be done there, but as I understand, the cache keys are generated using the name of the field plus the serialized arguments. In my case I have different year and dates variables, so I expected that there will be no conflict between them.

I'll try to remove the relay structure and I'll read the docs again.

Thank you!

Edit: @benjamn I changed the query so that I only have previousYearData subquery and I also removed some variables / filters:

query getChartData($previousYear: Int!, $previousYearEndDate: Date) {
  previousYearData: nausicaaReportings(year: $previousYear, service: "OTHERS", date_Lte: $previousYearEndDate, dateType: "VISITS") {
    edges {
      node {
        date
        turnover
      }
    }
  }
}

and these are the variables of the query:

{
    previousYear: 2019,
    previousYearEndDate: '2019-12-11'
  }

and I still have an infinite loop. So, I think is not related to the relay structure, but I'll try to remove it also.

Can it be related to the size of the response (because it's pretty big)?

Edit2: yes, I think this is related to the actual size of the response. If I limit the results to 100 entries everything works normally. Any ideas how to fix this?

martinjlowm commented 3 years ago

Can it be related to the size of the response (because it's pretty big)?

Edit2: yes, I think this is related to the actual size of the response. If I limit the results to 100 entries everything works normally. Any ideas how to fix this?

That is an interesting find! The places We see infinite loops are with big responses as well, i.e. metric data. For such types/resolvers we've set keyFields to false and disabled merging in the cache configuration. I don't think it has any influence though.

alinpetrusca commented 3 years ago

I managed to fix this by reducing the size of the response (we had over 30000 results initially, but we made some aggregations on the backend side and currently have ~500 results). But, I think this is still an issue with big responses. From my point of view, it doesn't make sense to have keyArgs or other configurations if by default all the arguments + the name of the field are used to cache the values.

For example, I have those two queries:

export const GET_DATA = gql`
  query getData($year: Int!, $currentYearStartDate: Date, $currentYearEndDate: Date) {
    currYearWebData: reportingsAggregations(
      year: $year
      date_Gte: $currentYearStartDate
      date_Lte: $currentYearEndDate
      channelType: "WEB"
      dateType: "DATE_TYPE_1"
    ) {
      value
    }
  }
`;

and this

export const GET_DATA = gql`
  query getData($year: Int!, $currentYearStartDate: Date, $currentYearEndDate: Date) {
    currYearWebData: reportingsAggregations(
      year: $year
      date_Gte: $currentYearStartDate
      date_Lte: $currentYearEndDate
      channelType: "WEB"
      dateType: "DATE_TYPE_2"
    ) {
      value
    }
  }
`;

The first query uses cache-and-network fetchPolicy, the second uses the default (cache-first). We can clearly see that dateType filter is different in each query. These two queries are triggered in the same time: I get the result from the first one, I get the result from the second one and after this the first one gets retriggered. Why? How to avoid this extra call?

benjamn commented 3 years ago

@Alin13 Do you have any field policies (like keyArgs) configured for the Query.reportingsAggregations field, or are you using the default keyArgs behavior (include all arguments, sorted by key)?

alinpetrusca commented 3 years ago

@benjamn No extra configuration, only the default.

nkahnfr commented 3 years ago

Hi,

I am encountering the same issue with an infinite loop when a query response contains a lot of data. I think that I reproduced the bug using your error template: https://github.com/nkahnfr/react-apollo-error-template/commit/6da14edf134e07e0874383a7399c59efced63001. You can change nbItems to a lower value (30k) and then everything is ok.

Let me know if you need more details. Thanks for the great job and for your help.

brainkim commented 3 years ago

So the exact value for which this infinite loop starts happening is 32767. Suspiciously, this is equal to 2^15 - 1. Interesting, huh? Thanks for the reproduction @nkahnfr! I am investigating.

brainkim commented 3 years ago

One note for people currently struggling with this issue. It is likely fixed in 3.4 thanks to certain optimizations with regard to canonicalization of InMemoryCache result objects (https://github.com/apollographql/apollo-client/pull/7439). I will have a larger post-mortem on this issue by end of day today, but in the meantime, please try the 3.4 beta and see if that helps.

brainkim commented 3 years ago

Right so the TL;DR is that this issue is “fixed” in 3.4 and you should jump on the beta/rc when you get the chance.

So what’s happening: As it turns out, when we read from the cache, we cache individual reads by selectionSet, ref and variables, as a layer of caching on top of the actual cache. We use cache hits to determine whether or not we should send the query to the server a second time, causing the infinite loop. This is done in QueryInfo.setDiff().

Why is this happening for large-ish values?

The optimism library, on which this library depends, has a default cache limit of 2 ^ 16. Exceeding this limit is what kicks off the loop.

Why is this happening for nested queries only?

The nested query stuff is a red herring. The reality is that the optimism caching is based on query, not just normalized entity, so the reason the array size limit was reduced to 2 ^ 15 - 1 is that we cached each item in the array for the parent query, cached each item in the array for the child query, and then there’s an extra cache value for the parent query itself, which gets us back to 2 ^ 16. You can negatively confirm this just by having the child query alone but with 2^16 values in the array. https://codesandbox.io/s/cache-and-network-reproduction-216-y0r17

Why does `cache-and-network` rely on referential equality?

This is the actual tough question, because it gets into the philosophy of fetch policies. My guess of what I think everyone is struggling with, is that we expect network request to only happen once per call, but really the “call” is a useQuery() hook and we’re in the world of React where ceaseless re-execution is the norm. So it’s tough to say on the apollo client side if we’re actually making another call, so for Apollo we fall back on this tenuous referential equality check here. How it actually triggers another request from the link is a more complicated question involving things like “reobservers” that I spent a bit of yesterday looking into but still don’t have the exact picture on.

In any case, this is unactionable because it’s likely fixed in 3.4, but I’m keeping an eye on it.

chillyistkult commented 3 years ago

I tried with 3.4.0-rc.2 and still see this issue. Is it confirmed that it should be fixed there @brainkim?

brainkim commented 3 years ago

@chillyistkult Just checked against 3.4.0-rc.3 and it seems to still be fixed. If you have a reproduction for the behavior you’re seeing I am always happy to take a look!

martinjlowm commented 3 years ago

We upgraded from rc.0 to rc.6 today and noticed this pattern - I have yet to investigate if it was indeed this upgrade that caused it. I reverted our deployment and it started to settle back to normal after some time (once our users refreshed their page).

hwillson commented 3 years ago

@martinjlowm are there any additional details you can provide to help us determine if this is an Apollo Client issue? What does invocations mean in this context? Were you able to investigate further to determine if AC is the cause here?

martinjlowm commented 3 years ago

Hi @hwillson ,

I did have a closer look - I believe this was an effect of a thrown invariant error for some, but not something that applied for all clients. At least, it was easy for me to replicate if an API response didn't include any data (4XX errors). I saw:

  9: {
    file: "@apollo/client/cache/inmemory/writeToStore.js",
    node: new InvariantError("Missing field '" + resultFieldKey + "' in " + JSON.stringify(result, null, 2).substring(0, 100))
  },

and immediately after, an unrelated query started to go crazy.

Oh, and the invocations are API requests over a period of 5 minutes per data point.

I'll try to see if I can gather more information and perhaps even isolate it to a particular RC bump.

martinjlowm commented 3 years ago

rc.0: ✅
rc.1: ✅
rc.2: ❌
...

I think perhaps it's this line: https://github.com/apollographql/apollo-client/compare/v3.4.0-rc.1...v3.4.0-rc.2#diff-5fb8ad16cfbcb51bc035d32b5963734561c440b48417a1c2dbf80a16098be67bR343 - data may be null here.

GitHub eats the pound in the link :<

Right off the bat, I cannot say why the client would attempt to keep refetching a triggered query from this though.

hwillson commented 3 years ago

This is super helpful @martinjlowm - thanks for this!

martinjlowm commented 3 years ago

I managed to replicate some infinite loop on rc.6. I tried setting a couple of breakpoints just to see how the logic behaved. The variables to the query of interest does not change (same reference). Hope it's helpful - this is without any invariant errors.

You can see the red bars switching between two states as if there were two queries that raced one another.

https://www.icloud.com/iclouddrive/0H_M5_zserYxCERARXXiiIMRg#Screen_Recording_2021-06-14_at_22.56

I can confirm downgrading to rc.0 resolved it for us. ~1800 I deployed rc.6 again and a couple hours later, around midnight, I reverted just @apollo/client to rc.0 and that is where the drop of requests is. This one is CloudFront metrics of all resource requests in general.

sgentile commented 2 years ago

We get infinite looping as well and the global overrides are still broken

sgentile commented 2 years ago

this makes me lose confidence : https://tomoima525.medium.com/pitfalls-i-fell-into-during-apollo-client-3-0-migration-4829bfe0a45a

Aliendreamer commented 2 years ago

I can confirm this still happens 3.3.7 we had network-only and still it triggered. Changing it to cache-first fix the problem, I implemented all the workarounds I know making errors null, using observables to refresh on auth error, returning explicitly nothing and still. We haven't tried it with 3.5.7 we have to fix our typescript before switching to it as it throws us an error for void function.

lohenyumnam commented 2 years ago

Any solution to this problem guys ??

UjinT34 commented 2 years ago

Try disabling babel-plugin-graphql-tag

apollographql / apollo-client