facebook / relay

Relay is a JavaScript framework for building data-driven React applications.
https://relay.dev
MIT License
18.38k stars 1.82k forks source link

Chrome freezes after query returns a "large" response #4649

Closed bchu1 closed 6 months ago

bchu1 commented 6 months ago

I'm using fetchQuery() useQueryLoader() and have run into a case where on Chrome (surprisingly Firefox seems fine?) will completely hang and freeze the browser for like 10 seconds after the following request comes back.

Query: https://pastebin.com/tB4M15zA Sample variables: https://pastebin.com/SJt0Mhb1 Sample response: https://pastebin.com/zd04mzTH

This is the flamegraph I got from Firefox's debugger: image

Here's the file for the Firefox profile: Firefox 2024-03-14 13.18 profile.json

captbaritone commented 6 months ago

Thanks for the report. Interesting that Chrome and Firefox behave differently here. Some things that would help:

  1. A paste of the normalization artifact for this query (the file that the Relay compiler generates from that query)
  2. The variables used (can be anonymized, but would be curious if there are some large variables in there)
  3. The wall time taken by Firefox so we can establish just how different Chrome and FF are
  4. Do you have any polyfills in your environment that might impact performance here? For example, pollyfills of Map/Set or for _ of?
  5. Which version of Relay are you using?

My goal is to be able to create a minimal repro with this exact query so that we can look more closely at it.

Thanks again for the report

bchu1 commented 6 months ago
  1. Is this what you're referring to? https://pastebin.com/np5p0fcR (the relevant query here is fedSequencingReads fedWorkflowRunsAggregate)
  2. Sure, here's the query: https://pastebin.com/Mu7PkD2m https://pastebin.com/tB4M15zA and here's the variables: https://pastebin.com/X81AkZz1 https://pastebin.com/SJt0Mhb1
  3. CPU seems to go to 100% for about 1.5 secs in Firefox (I uploaded the dump in my original comment). Interestingly in Chrome it seems to go to 100% forever until I stop recording the profile! Here's a link to the Chrome dump: https://drive.google.com/file/d/1LgfvwAbHhpHHELN1lc-454pFFTlAgVsw/view?usp=drive_link
  4. Don't think so

Map Æ’ Map() { [native code] }

npm list react-relay --depth=0 idseq-web@ /Users/bchu/czid-web-private └── react-relay@16.0.0

bchu1 commented 6 months ago

We are using toPromise() after fetchQuery() as well, if that matters.

Update: The actual culprit query is using useQueryLoader().

bchu1 commented 6 months ago

And here's the actual file: https://github.com/chanzuckerberg/czid-web/blob/main/app/assets/src/components/views/discovery/DiscoveryViewFC.tsx (I forgot it was open source 😆)

captbaritone commented 6 months ago

Hmm, I'm not able to see the repository (I get a 404) perhaps it's not actually public?

Looking at the Chrome profile it seems that Relay is getting many many responses from your network implementation. Perhaps there's some kind of stream/defer in play, or perhaps something strange in the way your network layer is implemented? Can you share what you're passing as the network config option to your Relay environment?

Is suspect it's somehow returning many many payloads for your query.

bchu1 commented 6 months ago

Oops, I linked the private repo, updated it to be the public one.

We shouldn't be using any @streams or @defers anywhere.

Here's where we're setting network: https://github.com/chanzuckerberg/czid-web/blob/main/app/assets/src/relay/environment.ts#L28

captbaritone commented 6 months ago

Can you double check that it's not simply that your code is making a large number of requests by accident? Maybe put a console.log in that fetch function? Looking at where that query is fetched, it's a little hard to tell how it's getting called.

bchu1 commented 6 months ago

Update! Looks like the culprit is not the original query... I removed another query (fedWorkflowRunsAggregate) and now everything is working fine. Here the details on that query...

Query: https://pastebin.com/tB4M15zA Sample variables: https://pastebin.com/SJt0Mhb1 Sample response: https://pastebin.com/zd04mzTH

The response is actually not any larger than the other queries we're making... but the variables are definitely huge. I wonder if this has anything to do with it? And maybe that could explain why Firefox performs better than Chrome, if there's some difference in hashing?

Anyways, I'll update the title and previous comments to reflect the new info.

bchu1 commented 6 months ago

Yeah, I double checked the Network tab and there's only 3 queries being made, which is expected.

captbaritone commented 6 months ago

From the Chrome flame chart you shared, it looks like this _next function is getting called many hundreds of times:

https://github.com/facebook/relay/blob/586505b068bac9efdcc9b2b44d824d2468bb1cc2/packages/relay-runtime/store/OperationExecutor.js#L232

That should the result of payload response directly from the observable returned by the network layer. Maybe you could log/set a breakpoint there and see what happens on Chrome?

Screenshot 2024-03-15 at 5 37 34 PM
captbaritone commented 6 months ago

Normalizing that response on its own, one time, does not seem to be slow: https://github.com/captbaritone/relay-normalization-repro

bchu1 commented 6 months ago

I may have to get back to you after the weekend, but will try logging response there and see what happens!

bchu1 commented 6 months ago

Some updates:

  1. Weirdly, when I log there I only get 3 logs, 1 for each GQL query sent...
  2. I realized a possibly more likely cause: This query is using useQueryLoader() and there are multiple instances of the component that consumes the query reference via usePreloadedQuery() on the page (it's rendered in each row of a (virtualized) table).
  3. We're going to try switching this query to fetchQuery() and if that fixes the issue, we'll likely move on with that to unblock the team's deadlines.
bchu1 commented 6 months ago

More updates:

bchu1 commented 6 months ago

We just tried artificially limiting the response on the server to only 20 aggregate items in the array and the freezing is gone. So I guess it does have something to do with the response (size?).

bchu1 commented 6 months ago

We resorted to paginating this query and the performance issues are gone now. I'll go ahead and close this since it doesn't seem like much progress is being made on finding a root cause (feel free to reopen or do whatever you need with this 😸).

FWIW, I profiled again in Firefox before the pagination fix (using fetchQuery()) and still didn't see much CPU usage at all, compared to in Chrome where I saw a giant microtask drain operation that took 2 minutes, mostly doing those normalize calls.