Automattic / msm-sitemap

Comprehensive sitemaps for your WordPress VIP site. Joint collaboration between Metro.co.uk, WordPress VIP, Alley Interactive, Maker Media, 10up, and others.
74 stars 37 forks source link

Add date constraint filter to get_last_modified_posts() query #165

Closed davidsword closed 3 years ago

davidsword commented 3 years ago

Hello

This query has been appearing in slow query logs on busier multisites.

https://github.com/Automattic/msm-sitemap/blob/6d58e2e65d77f140fe1e32fde7ee4ef48a01add4/msm-sitemap.php#L590

Though it's running on cron having dozens or hundreds of sites running this can be overly taxing.

Some organizations may not be editing posts from months past, and may benefit more from the option of using a more efficient query. For this example case, here's an EXPLAIN on that query:

+------+-------------+-------------+-------+------------------+------------------+---------+------+--------+------------------------------------+
| id   | select_type | table       | type  | possible_keys    | key              | key_len | ref  | rows   | Extra                              |
+------+-------------+-------------+-------+------------------+------------------+---------+------+--------+------------------------------------+
|    1 | SIMPLE      | wp_XX_posts | range | type_status_date | type_status_date | 82      | NULL | 359926 | Using index condition; Using where |
+------+-------------+-------------+-------+------------------+------------------+---------+------+--------+------------------------------------+

Adding AND post_status = 'publish' AND post_date > DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-%d %H:%i:%s') to the query resulted in this case 355,278 less rows scanned, which is a significant performance bump:

+------+-------------+-------------+-------+------------------+------------------+---------+------+------+------------------------------------+
| id   | select_type | table       | type  | possible_keys    | key              | key_len | ref  | rows | Extra                              |
+------+-------------+-------------+-------+------------------+------------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | wp_XX_posts | range | type_status_date | type_status_date | 169     | NULL | 4648 | Using index condition; Using where |
+------+-------------+-------------+-------+------------------+------------------+---------+------+------+------------------------------------+

Noting post_status was added to properly leverage the date in type_status_date. Both of these args would great candidates for filters to help provide some options to improve this query when it performs poorly for some sites.

davidsword commented 3 years ago

Seems https://github.com/Automattic/msm-sitemap/issues/112 would be a much better approach to eliminate this query all together.

Instead of looking for posts based on the last modified date, it might make sense to hook into transition post status instead and track the dates for those posts (in an option) and then use those instead.

Closing this issue out in favour for #112