Open Cellule opened 4 months ago
Here are my findings so far (no repro still -_-)
I have a query on some page and it is set as "dirty" at some point https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L229-L234
Then the query is unmounted, but remains "dirty"
Later we call resetStore
which calls reFetchObservableQueries
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/ApolloClient.ts#L594-L603
In reFetchObservableQueries
we call broadcastQueries
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L969
In broadcastQueries
we call notify
for ALL queries even inactive ones
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L1072-L1075
In notify
for my problematic query, because it is "dirty" we call the listener which ends up calling reobserverCacheFirst
on that query causing to send it to the server
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L251-L272
So it seems like at some point we should ignore "inactive" queries, aka observableQuery.hasObservers()
is false
I think the right place to do this check would be in QueryInfo.shouldNotify
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L289-L292
Otherwise, maybe we should NOT call broadcastQueries
in the first place from within reFecthObservableQueries
?
Alright after going through this whole effort, I managed to make my minimal repro work!!
@Cellule thanks for the bug report!
The reproduction is super helpful! I've cloned your reproduction and started looking into this to understand myself. Its close to the weekend and I'm not sure I'll be able to finish until early next week, but thanks so much for your thorough analysis. This will definitely save me a ton of time. I'll get back to you as soon as I can!
@Cellule looking at this further, your diagnosis seems correct. I had originally assumed there was a bug holding onto the ObservableQuery
in the this.queries
map inside QueryManager
, but that doesn't appear to be the case (hence my assumption in https://github.com/apollographql/apollo-client/issues/11894#issuecomment-2197707409).
I can now see that the only reason you're getting the refetch is because the ObservableQuery
instance from the first page hasn't yet been garbage collected, so the notify
call that you pointed out in your deep dive is triggering the refetch.
I've updated your reproduction to show that once ObservableQuery
is garbage collected, we're able to see only a single network request (try clicking the "reset store" button after you see the log message about it getting garbage collected): https://codesandbox.io/p/devbox/fervent-scooby-c5xwk3. This at least confirms that we aren't holding onto that ObservableQuery
instance somewhere else in the client 🎉.
I've discussed this with the team to make sure adding a check for hasObservers
wouldn't have any other adverse effects, and we couldn't think of any. I think your suggested fix makes sense. I'd like to dig into the proper place for this, but otherwise I think we should move forward with this fix. If you're interested in submitting a PR, I'd be happy to review, otherwise I'll try and get something up within the next couple days.
Thanks again for your thorough deep dive through the code and the time getting a reproduction! This made my life so much easier ❤️.
@jerelmiller Great to hear! I was about to open a PR then figured I should write a test for this. However, I can't figure out how to reach the state where there's an inactive ObservableQuery in the QueryManager. Actually, the more I review the code, the less I understand how it's possible to have an inactive query in the query manager 😅
Haha right?!?! Thats why I was so confused why you were able to reproduce this because technically it shouldn't be possible 😆.
From what I understand, its the fact that the ObservableQuery
instance still exists in memory and hasn't been garbage collected yet, so its able to respond to updates when broadcastQueries
is run. The best way I could think to try and approach a test would be:
If you want to test this at the React level with useQuery
:
useQuery
. useQuery
(a ternary would probably work just fine here). Make sure these are different queries so that query deduplication doesn't kick in giving you a false positive.client.reFetchObservableQueries
If you opt for this approach, we have a handy Profiler
utility that makes it easier to assert render-by-render. Here is an example of its usage with useQuery
: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/react/hooks/__tests__/useQuery.test.tsx#L4177-L4432
If you want to test this with core APIs:
client.watchQuery
client.watchQuery
. Ensure its a different query so that query deduplication doesn't kick in giving you a false positive.client.reFetchObservableQueries
If you opt for this approach, we have a handy ObservableStream
utility that captures each emitted result that you can assert on in a more synchronous way. Here is an example of its usage: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/core/__tests__/ObservableQuery.ts#L2393-L2509
If you're struggling with either of these, let me know and I'd be happy to take a stab at creating a failing test for this to use as a base for implementing the change.
So I'm not 100% sure why, but my current investigation leads me to this bit of code where InternalState.useObservableQuery
will call this.client.watchQuery
twice for the same query (can be seen on page load of my repro).
The first one created seems to be abandoned, but remains in the list of query of the QueryManager.
So in the end, it seems it's not an issue of query being cleaned up left in the QueryManager. It's a separate query that was never properly registered that is somehow in the list. When resetStore is called, that "never alive" query is revived. So maybe a proper fix would be to avoid this duplicate in the first place since I suspect there might be lots of "uninitialized" queries
That sounds very familiar to https://github.com/apollographql/apollo-client/pull/11837 where I noticed something similar. Are you using React's strict mode by chance? I believe the the issue here was that the strict mode caused useState
to run twice, which meant useInternalState
was creating 2 instances of InternalState
, which ran watchQuery
twice: https://github.com/apollographql/apollo-client/blob/116723d88b5600126cb266060ca58ce989d3988e/src/react/hooks/useQuery.ts#L125
Take a look at the PR description in https://github.com/apollographql/apollo-client/pull/11837. Perhaps you might be seeing something similar. Funny enough, the fix for that one was also doing a check for hasObservers
😆
It does look like StrictMode is the offender in the repro above where useRef
in useInternalState
is empty on both renders causing to create an ObservableQuery twice
It would point to bad cleanup in useInternalState
.
However, I'm hitting this issue in production where StrictMode should do nothing. Let me see if I can repro without StrictMode
My biggest fear is that there's a memory leak somehow. I haven't been able to confirm it either in a repro nor in our production application, so I suppose that for now the initial patch should do the trick. I'll try to monitor afterward in case memory leak occurs
As for testing, I managed to write a test that fails without the fix and passes with it, it's not the cleanest approach, but let's discuss this on the PR directly
Thanks for the PR! I'll take a look when I'm back on Monday. Have a great weekend!
@jerelmiller
Someone on my team, @lorenzopicoli , found another related issue.
In one of our mutations, we're using the refetchQueries
option
const [mutation] = useMutation<ICreateAssetMutationResp, ICreateAssetMutationArgs>(
createAssetMutation,
{
ignoreResults: true,
refetchQueries: [
// needed because a sub-asset can be created at any level in the sub-assets page
// called if in /assets/:assetId/subs, to refresh the descendants count in treeView
"AssetParentTreeView",
// called if in /assets/:assetId, to refresh the descendants count in sub-asset tree-view
"AssetDetailsQuery",
],
}
)
I went down through the code and ended up in QueryManager.getObservableQueries
https://github.com/Cellule/apollo-client/blob/823b26e1073ae9c50d7c70078ea99e758975dab6/src/core/QueryManager.ts#L852-L900
It seems if include
is an array of strings, it'll return queries matching the name even if they're not active.
I feel this is a mistake, but I can't 100% confirm. I'll add a proposed fix in my PR for you to take a look
I dug a bit more into it. From what I can see here's what happen (exacerbated by StrictMode, but possible without)
broadcastWatch
after getting results from the "second" query, this causes the first query to become dirty
So clearly StrictMode is being a huge offender of this problem. I can see that the useState initializer function runs twice in StrictMode https://github.com/apollographql/apollo-client/blob/80471816396fa023fa2dc3b768ceb3da5b22edf4/src/react/hooks/useQuery.ts#L179-L205
https://react.dev/reference/react/useState#caveats
Since there's no proper "cleanup" for the observable that is created in useState
, we always end up with 2 observable for every query.
According to this
https://github.com/facebook/react/issues/20090#issuecomment-715926549
Creating a watcher/listener in useState is an anti-pattern and that's why it causes such problems in StrictMode
I'll try to come up with a better repro without StrictMode, but I do believe we should figure out and fix that issue in StrictMode as well to respect the rules of React
@Cellule thanks for that confirmation. I suspected strict mode was the cause of at least one of the cases and agree we should try and do a better job here to avoid he double initialization. Do let me know if you're able to reproduce it without strict mode though. I'd be curious how that is allowed to happen, but my guess is the underlying cause is still very similar (e.g. multiple ObservableQuery
instances are created)
Now I'm starting to wonder if the main issue was fixed (aka dead queries in production)
When I initially encountered this issue we were on apollo-client 3.10.4, we are currently on 3.10.8 and I can't seem to repro
I reviewed the diff between those 2 versions and it does look like something changed around how useQuery
handles its state
v3.10.4...v3.10.8
Or more specifically from this PR https://github.com/apollographql/apollo-client/pull/11851
You probably have more context around that particular change, do you think this solved it ?
If yes, then maybe we just have to focus on fixing the useState(createInternalState)
in StrictMode
@Cellule ok thats good news! That change was made to provide better compatibility with the new React Compiler. We were violating a few rules of React that would have made it difficult to integrate with it. Very possible the move away from the ref and into state helps here.
Great, I updated the title of the issue to better represent the issue.
Issue Description
We recently upgraded to version 3 Right now we use
resetStore
for a particular scenario. All graphql requests sent have an "organizationId" header associated. Depending on that "organizationId" the user will see different content. When the user changes to another organization we navigate to an almost empty page for the transition (to unmount all organization specific queries) Then we callresetStore
to refetch top-level queries that "might" be affected by the change of context. Once everything resolves, we navigate back to the normal app.This used to work fine for the last 5 years, then we updated to apollo client version 3.
After some digging we found out that after calling
resetStore
some queries that are unmounted (confirmed 100%) end up being refetched. I debugged through the resetStore method, but I found no calls to refetch those queries. So right now I have no idea what could cause those unmounted queries to be requested once again.The biggest problem I have right now is that because those queries are called from the wrong context, it causes another mechanism to automatically redirect to the previous organization (so direct links work in any organization) I have a lot of fail safe in place and it's fine 90% of the time, but when the network is slower those fail safe are not enough and it leads to the user no being able to change organization.
Any guidance on what could cause an unmounted query to hit the network once more after calling resetStore would be appreciated Or tips on how to find out
Link to Reproduction
https://codesandbox.io/p/devbox/jolly-wilbur-n7m7jz?layout=%257B%2522sidebarPanel%2522%253A%2522EXPLORER%2522%252C%2522rootPanelGroup%2522%253A%257B%2522direction%2522%253A%2522horizontal%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522id%2522%253A%2522ROOT_LAYOUT%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522clwtppgmj00073p6i6i7ie1cc%2522%252C%2522sizes%2522%253A%255B70%252C30%255D%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522EDITOR%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522SHELLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522DEVTOOLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%252C%2522sizes%2522%253A%255B69.49598743432026%252C30.504012565679744%255D%257D%252C%2522tabbedPanels%2522%253A%257B%2522clwtppgmj00023p6iumaf2w8i%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%252C%2522activeTabId%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00013p6ipicylgfm%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252FREADME.md%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522id%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522initialSelections%2522%253A%255B%257B%2522startLineNumber%2522%253A25%252C%2522startColumn%2522%253A27%252C%2522endLineNumber%2522%253A25%252C%2522endColumn%2522%253A27%257D%255D%252C%2522filepath%2522%253A%2522%252Fsrc%252Flink.js%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252F.codesandbox%252Ftasks.json%2522%252C%2522id%2522%253A%2522clwumzut3002e3p6gxrqqie21%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%255D%257D%252C%2522clwtppgmj00063p6ixzewo28y%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%252C%2522activeTabId%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00053p6iqkbdkq6f%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_PORT%2522%252C%2522taskId%2522%253A%2522start%2522%252C%2522port%2522%253A3000%252C%2522path%2522%253A%2522%252F%2522%257D%252C%257B%2522type%2522%253A%2522SETUP_TASKS%2522%252C%2522id%2522%253A%2522clxyzmn6r000o3j6jbsttjmyy%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522UNASSIGNED_PORT%2522%252C%2522port%2522%253A2222%252C%2522id%2522%253A%2522clxyzn4b5001p3j6jw7ckv2w6%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522TASK_PORT%2522%252C%2522port%2522%253A3000%252C%2522taskId%2522%253A%2522start%2522%252C%2522id%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%255D%257D%252C%2522clwtppgmj00043p6igepxit5p%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%252C%2522activeTabId%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_LOG%2522%252C%2522taskId%2522%253A%2522start%2522%257D%252C%257B%2522id%2522%253A%2522clwtqf2io00a13p6m5m1c3s1n%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TERMINAL%2522%252C%2522shellId%2522%253A%2522clwtqf2lx0074dke2ga51ahaj%2522%257D%255D%257D%257D%252C%2522showDevtools%2522%253Atrue%252C%2522showShells%2522%253Atrue%252C%2522showSidebar%2522%253Atrue%252C%2522sidebarPanelSize%2522%253A15%257D
Reproduction Steps
@apollo/client
version3.10.4