Yoast / wordpress-seo

Yoast SEO for WordPress
https://yoast.com/wordpress/plugins/seo/
Other
1.77k stars 894 forks source link

Yoast\W\S\A\I\Abstract_Indexing_Action->get_limited_unindexed_count() slow query #18065

Closed thomascharbit closed 6 months ago

thomascharbit commented 2 years ago

Please give us a description of what happened.

On a website with a lot of content the Yoast\W\S\A\I\Abstract_Indexing_Action->get_limited_unindexed_count() function triggers a very slow SQL query?

The following query is very slow:

SELECT P.ID
FROM wp_posts AS P
WHERE P.post_type IN ('post', 'page', 'attachment', 'artiste', 'product')
AND P.post_status NOT IN ('auto-draft')
AND P.ID not in (
SELECT I.object_id from wp_yoast_indexable as I
WHERE I.object_type = 'post'
AND I.version = 2 )
LIMIT 26

which takes 369 seconds.

The wp_yoast_indexable table contains ~20000 rows. The wp_posts table contains ~27000 rows.

All posts are indexed and the above SQL query returns 0 entries.

Can you reproduce?

Used versions

im1981 commented 2 years ago

Hi, same issue for me. The slowness issues are still unsolved since years. @yoast team what i would recommend, create a testing instance with 10.000.000 posts with faker or something else and test your plugin against this! It should run as fast (+ maximum 1 second ) as it runs with 10 posts . If it does so it is ready for release, otherwise its broke and should be used for tiny personal blog only.
Tips:

  1. Don´t do too much actions on request, do it with wp-cron / action scheduler.
  2. Avoid loops where possible (like for each post do select/update/insert) use sql bulk commands.
  3. Use wp caching mechanism wherever possible.
  4. Check wisely where your actions are needed especially in the backend. If i am at post-new.php, do you really need Yoast\WP\SEO\Actions\Indexing\Abstract_Indexing_Action->get_limited_unindexed_count ? Which took 2 seconds in my case.
ifnull commented 2 years ago

I have same issue. While this isn't our production environment, this testing environment is an EC2 t3.2xlarge with an Aurora RDS cluster for the DB. The WP-Admin dashboard consistently loads with 30+ second response times while the plugin is installed. Deactivating the plugin results in consistent < 1 second response times. Screen Shot 2022-02-28 at 5 02 08 PM

CyberCr33p commented 2 years ago

Same issue here.

Djennez commented 2 years ago

If you experience this issue on every admin pageload I would first advise to check if your transients are set up properly. This query should only run every 15 minutes because it always stores a transient for that period of time.

Having said that, I do notice a query time of 4+ seconds on about 25000 posts, but that is nowhere near the 369 seconds described. We're still investigating this for now.

thomascharbit commented 2 years ago

Hi @Djennez, I can confirm this is not happening on every pageload but this is still an issue.

Djennez commented 2 years ago

I've converted this issue to an internal issues over here. However, it will probably not get picked up any time soon.

archon810 commented 1 year ago

This query causes annoyances on our wp-admin and makes the plugins page load for 60s+. Why does it run on this page at all?

If it needs to run periodically, stick it into wp cron, but please, don't make it run synchronously on pages that have nothing to do with Yoast.

archon810 commented 1 year ago

I was able to get very modest gains for the inner query by adding this index:

ALTER TABLE `wp_yoast_indexable` ADD INDEX `object_type_version_object_id`(`object_type`, `version`, `object_id`);

With yoast's own indexes, the inner SELECT runs with a full table scan.

archon810 commented 1 year ago

Any updates please? This is still painfully slow. Why does it need to run on the plugins list page wordpress/wp-admin/plugins.php at all?

archon810 commented 11 months ago

Anyone? @Djennez This behavior continues every time I visit the plugins page after a while.

image

archon810 commented 11 months ago

Here's a proposed change to the slow query. It is returning the same exact results for me, except in 0.3s, which is 2 orders of magnitude faster than the current query.

SELECT
    P.ID 
FROM
    wp_posts AS P
    LEFT JOIN wp_yoast_indexable AS I ON P.ID = I.object_id 
    AND I.object_type = 'post' 
    AND I.version = 2 
WHERE
    P.post_type IN ( 'post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release' ) 
    AND P.post_status NOT IN ( 'auto-draft' ) 
    AND I.object_id IS NULL 
    LIMIT 26;

Here's the current slow query:

SELECT
    P.ID 
FROM
    wp_posts AS P 
WHERE
    P.post_type IN ( 'post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release' ) 
    AND P.post_status NOT IN ( 'auto-draft' ) 
    AND P.ID NOT IN ( SELECT I.object_id FROM wp_yoast_indexable AS I WHERE I.object_type = 'post' AND I.version = 2 ) 
    LIMIT 26
archon810 commented 11 months ago

While I'm on the subject of poor performance caused by Yoast, this query, which loads on the Yoast dashboard, is far worse and is taking 45 seconds currently:

SELECT COUNT(P.ID)
FROM wp_posts AS P
WHERE P.post_type IN ('post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release')
AND P.post_status NOT IN ('auto-draft')
AND P.ID not in (
SELECT I.object_id from wp_yoast_indexable as I
WHERE I.object_type = 'post'
AND I.version = 2 )

It can be rewritten like so:

SELECT COUNT(P.ID)
FROM wp_posts AS P
LEFT JOIN wp_yoast_indexable AS I ON P.ID = I.object_id AND I.object_type = 'post' AND I.version = 2
WHERE P.post_type IN ('post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release')
AND P.post_status NOT IN ('auto-draft')
AND I.object_id IS NULL;

The rewritten query runs in 7.5s, much faster, albeit still quite slowly.

thijsoo commented 9 months ago

Hello all, Thank you for making suggestions on how we can improve our performance! We have been testing your suggested query improvements and we have not seen the improvements you show here. We have tried both locally on sites with ~10.000 posts and on external hosts with ~100.000 posts.

Is there anything else you could supply us with in your hosting setup that might have an impact on the query performance?

thijsoo commented 6 months ago

Hello all, We are closing this issue due to lack of interaction. If any of you run into this again, or have more information please let us know.

archon810 commented 6 months ago

Hi @thijsoo, I'm still able to reproduce this issue on Yoast 22.5, the latest version. Please reopen the ticket.

What sort of info are you looking for?

Some numbers from our setup:

select count(*) from wp_yoast_indexable
3630909

select count(*) from wp_posts
3556965

Explain on the original slow query currently in Yoast:

explain SELECT COUNT(P.ID)
FROM wp_posts AS P
WHERE P.post_type IN ('post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release')
AND P.post_status NOT IN ('auto-draft')
AND P.ID not in (
SELECT I.object_id from wp_yoast_indexable as I
WHERE I.object_type = 'post'
AND I.version = 2 )

1   PRIMARY P   range   type_status_date,post_status_type_password_date_modified,type_status_author post_status_type_password_date_modified 82      1742561 Using where; Using index
2   MATERIALIZED    I   ref object_type_and_sub_type,object_id_and_type,subpages,prominent_words,permalink_hash_and_object_type,published_sitemap_index,object_type_version_object_id   object_type_version_object_id   135 const,const 1683117 Using where; Using index

Explain on the proposed much faster query:

explain SELECT COUNT(P.ID)
FROM wp_posts AS P
LEFT JOIN wp_yoast_indexable AS I ON P.ID = I.object_id AND I.object_type = 'post' AND I.version = 2
WHERE P.post_type IN ('post', 'page', 'attachment', 'apps_post', 'apk_split', 'app_release')
AND P.post_status NOT IN ('auto-draft')
AND I.object_id IS NULL;

1   SIMPLE  P   range   type_status_date,post_status_type_password_date_modified,type_status_author post_status_type_password_date_modified 82      1742562 Using where; Using index
1   SIMPLE  I   ref object_type_and_sub_type,object_id_and_type,object_type_version_object_id   object_type_version_object_id   144 const,const,OURDBNAME.P.ID  1   Using where; Using index

The post_status_type_password_date_modified index is as follows:

post_status_type_password_date_modified `post_status`, `post_type`, `post_password`, `post_date`, `post_modified`

But regardless, this query should only run from cron and shouldn't massively slow down the plugins page (why is it running here at all?), the dashboard page, or the Yoast settings page. At best, if cron isn't viable, run it via Ajax, use a cached value and refresh it via Ajax, or avoid using it altogether.