Automattic / wp-calypso

The JavaScript and API powered WordPress.com
https://developer.wordpress.com
GNU General Public License v2.0
12.43k stars 1.99k forks source link

Domain list page breaks for users with many domains #96354

Open leonardost opened 1 week ago

leonardost commented 1 week ago

Quick summary

Users that have many domains (over a couple hundreds I believe) can't manage their domains in Calypso because the domain list page (/domains/manage) doesn't load. Some users have thousands of domains due to the Google Domains Takeover initiative (pcYYhz-1ts-p2).

Steps to reproduce

  1. Open the domain list page (/domains/manage) for a user that has many domains

What you expected to happen

The domain list should be loaded correctly and I should be able to manage my domains.

What actually happened

The domain list never finishes loading, and eventually sometimes the page completely breaks.

Example screenshot from a user support session:

v5YA15N2i8Msd354KDa1OkWMBgxEfh8oZO36rCC7.jpg

Impact

Some (< 50%)

Available workarounds?

No and the platform is unusable

If the above answer is "Yes...", outline the workaround.

No response

Platform (Simple and/or Atomic)

Simple

Logs or notes

No response

github-actions[bot] commented 1 week ago

OpenAI suggested the following labels for this issue:

renancarvalho commented 1 week ago

Hello 👋 do we know a user with this issue? That would help the investigation since it is challenging to have hundreds/thousand of domains.

Robertght commented 6 days ago

While I don't have an account with that many domains, on my end, I didn't have this problem, but I noticed their whole account loaded slow. @renancarvalho I'm going to send the details via Slack.

LE: I learned someone else will look into it. Please reach out via Slack once you see this. Thanks!

dsas commented 3 days ago

While I don't have an account with that many domains, on my end, I didn't have this problem, but I noticed their whole account loaded slow. @renancarvalho I'm going to send the details via Slack.

LE: I learned someone else will look into it. Please reach out via Slack once you see this. Thanks!

Details at p1732188303637839-slack-C07GZ2UA3TN

dsas commented 3 days ago

The page makes one API call per wpcom site, for users with hundreds of domains, that is hundreds of API calls. They did eventually all get a response. I think they were all loaded into the page and I was able to scroll down to domains beginning with Z.

Eventually I got an "Oh snap. error code 5" chrome crash. This happens faster if dev tools are open, and slower if it's not - so long as you interact with the page by e.g. scrolling slowly enough through the list for it to start populating stuff.

Following along in the chrome task manager I can see cpu use nearly constantly above 200%, memory use ranges between 2-6 gb and at the point of crash is increasing. I'm guessing the crash is due to some kind of memory pressure.

To set expectations: it's probably unlikely to be easy to find and fix the problem.

dsas commented 2 days ago

It looks like this has been happening for over a year: p1695308169001609-slack-C04H4NY6STW

Looked into this some more with @zaguiini , some things we noticed:

1. The page loads all of the domains.

Ideally it should only request information for domains in the viewport - not all of the information for all of the domains the user has. This appears to happen under @zaguiini's account, but it doesn't happen for the problematic user.

Over the next few weeks the table is being replaced by a dataview by @Automattic/nexus, so it might get resolved as a side effect. If we could figure out the problem first that would be better.

2. The purchases endpoint times out

/me/purchases returns a HTTP 504 for this user - gateway timeout. This will probably cause them problems on other calypso pages too.

/me/purchases doesn't currently have pagination fbhepr%2Skers%2Sjcpbz%2Schoyvp.ncv%2Serfg%2Sjcpbz%2Qwfba%2Qraqcbvagf%2Spynff.jcpbz%2Qfgber%2Qncv%2Qraqcbvagf.cuc%3Se%3Q4174p112%23656-og It does have some performance instrumentation with statsd.

It sounds like there have been repeated problems with this user's account causing fatals in payments code: p1694033164744309-slack-C096PD42U

3. The sites endpoint times out

/me/sites returns a HTTP 504 for this user - gateway timeout. This will probably cause them problems on other calypso pages too.

/me/sites doesn't currently have pagintion fbhepr%2Skers%2Sjcpbz%2Schoyvp.ncv%2Serfg%2Sjcpbz%2Qwfba%2Qraqcbvagf%2Spynff.jcpbz%2Qwfba%2Qncv%2Qzr%2Qfvgrf%2Qraqcbvag.cuc%3Se%3Q5o812359%236-og It does have some performance instrumentation with statsd.

It seems to me that we have at least three areas to improve to get the dashboard to load.

dsas commented 2 days ago

Opened child issues for purchases and sites.