Chrome freezes after query returns a "large" response

bchu1 commented 6 months ago

I'm using ~~fetchQuery()~~ useQueryLoader() and have run into a case where on Chrome (surprisingly Firefox seems fine?) will completely hang and freeze the browser for like 10 seconds after the following request comes back.

Query: https://pastebin.com/tB4M15zA Sample variables: https://pastebin.com/SJt0Mhb1 Sample response: https://pastebin.com/zd04mzTH

This is the flamegraph I got from Firefox's debugger:

Here's the file for the Firefox profile: Firefox 2024-03-14 13.18 profile.json

captbaritone commented 6 months ago

Thanks for the report. Interesting that Chrome and Firefox behave differently here. Some things that would help:

A paste of the normalization artifact for this query (the file that the Relay compiler generates from that query)
The variables used (can be anonymized, but would be curious if there are some large variables in there)
The wall time taken by Firefox so we can establish just how different Chrome and FF are
Do you have any polyfills in your environment that might impact performance here? For example, pollyfills of Map/Set or for _ of?
Which version of Relay are you using?

My goal is to be able to create a minimal repro with this exact query so that we can look more closely at it.

Thanks again for the report

bchu1 commented 6 months ago

Is this what you're referring to? https://pastebin.com/np5p0fcR (the relevant query here is ~~fedSequencingReads~~ fedWorkflowRunsAggregate)
Sure, here's the query: ~~https://pastebin.com/Mu7PkD2m~~ https://pastebin.com/tB4M15zA and here's the variables: ~~https://pastebin.com/X81AkZz1~~ https://pastebin.com/SJt0Mhb1
CPU seems to go to 100% for about 1.5 secs in Firefox (I uploaded the dump in my original comment). Interestingly in Chrome it seems to go to 100% forever until I stop recording the profile! Here's a link to the Chrome dump: https://drive.google.com/file/d/1LgfvwAbHhpHHELN1lc-454pFFTlAgVsw/view?usp=drive_link
Don't think so

Map ƒ Map() { [native code] }

npm list react-relay --depth=0 idseq-web@ /Users/bchu/czid-web-private └── react-relay@16.0.0

bchu1 commented 6 months ago

~~We are using toPromise() after fetchQuery() as well, if that matters.~~

Update: The actual culprit query is using useQueryLoader().

bchu1 commented 6 months ago

And here's the actual file: https://github.com/chanzuckerberg/czid-web/blob/main/app/assets/src/components/views/discovery/DiscoveryViewFC.tsx (I forgot it was open source 😆)

captbaritone commented 6 months ago

Hmm, I'm not able to see the repository (I get a 404) perhaps it's not actually public?

Looking at the Chrome profile it seems that Relay is getting many many responses from your network implementation. Perhaps there's some kind of stream/defer in play, or perhaps something strange in the way your network layer is implemented? Can you share what you're passing as the network config option to your Relay environment?

Is suspect it's somehow returning many many payloads for your query.

bchu1 commented 6 months ago

Oops, I linked the private repo, updated it to be the public one.

We shouldn't be using any @streams or @defers anywhere.

Here's where we're setting network: https://github.com/chanzuckerberg/czid-web/blob/main/app/assets/src/relay/environment.ts#L28

captbaritone commented 6 months ago

Can you double check that it's not simply that your code is making a large number of requests by accident? Maybe put a console.log in that fetch function? Looking at where that query is fetched, it's a little hard to tell how it's getting called.

bchu1 commented 6 months ago

Update! Looks like the culprit is not the original query... I removed another query (fedWorkflowRunsAggregate) and now everything is working fine. Here the details on that query...

Query: https://pastebin.com/tB4M15zA Sample variables: https://pastebin.com/SJt0Mhb1 Sample response: https://pastebin.com/zd04mzTH

The response is actually not any larger than the other queries we're making... but the variables are definitely huge. I wonder if this has anything to do with it? And maybe that could explain why Firefox performs better than Chrome, if there's some difference in hashing?

Anyways, I'll update the title and previous comments to reflect the new info.

bchu1 commented 6 months ago

Yeah, I double checked the Network tab and there's only 3 queries being made, which is expected.

captbaritone commented 6 months ago

From the Chrome flame chart you shared, it looks like this _next function is getting called many hundreds of times:

https://github.com/facebook/relay/blob/586505b068bac9efdcc9b2b44d824d2468bb1cc2/packages/relay-runtime/store/OperationExecutor.js#L232

That should the result of payload response directly from the observable returned by the network layer. Maybe you could log/set a breakpoint there and see what happens on Chrome?

captbaritone commented 6 months ago

Normalizing that response on its own, one time, does not seem to be slow: https://github.com/captbaritone/relay-normalization-repro

bchu1 commented 6 months ago

I may have to get back to you after the weekend, but will try logging response there and see what happens!

bchu1 commented 6 months ago

Some updates:

Weirdly, when I log there I only get 3 logs, 1 for each GQL query sent...
I realized a possibly more likely cause: This query is using useQueryLoader() and there are multiple instances of the component that consumes the query reference via usePreloadedQuery() on the page (it's rendered in each row of a (virtualized) table).
We're going to try switching this query to fetchQuery() and if that fixes the issue, we'll likely move on with that to unblock the team's deadlines.

bchu1 commented 6 months ago

More updates:

Switching to fetchQuery() did not solve the freezing, unfortunately.
I let the browser wait until it stopped freezing before ending the profiler recording this time (using fetchQuery() now). Here's the full profile: https://drive.google.com/file/d/1E8SyB17ylQilYOCLUYbewF4xfmRjWs0z/view?usp=sharing It stayed at 100% for about 2 minutes. I see a lot of _traversSelections(), _normalizeField(), and _normalizeLink() in this one.
Here's the response for the above profile: https://pastebin.com/xJxnmGih

bchu1 commented 6 months ago

We just tried artificially limiting the response on the server to only 20 aggregate items in the array and the freezing is gone. So I guess it does have something to do with the response (size?).

bchu1 commented 6 months ago

We resorted to paginating this query and the performance issues are gone now. I'll go ahead and close this since it doesn't seem like much progress is being made on finding a root cause (feel free to reopen or do whatever you need with this 😸).

FWIW, I profiled again in Firefox before the pagination fix (using fetchQuery()) and still didn't see much CPU usage at all, compared to in Chrome where I saw a giant microtask drain operation that took 2 minutes, mostly doing those normalize calls.

facebook / relay

Chrome freezes after query returns a "large" response #4649