10up / Engineering-Best-Practices

10up Engineering Best Practices
https://10up.github.io/Engineering-Best-Practices/
MIT License
760 stars 205 forks source link

Add paragraph on the performance of meta_queries using EXISTS/NOT_EXISTS #355

Open moraleida opened 3 years ago

moraleida commented 3 years ago

Is your enhancement related to a problem? Please describe.

It is very common that developers introduce new meta_keys without setting default values for posts that already existed in the database, or that they delete meta_keys when the value is not needed instead of setting it to a default or false value.

This becomes an issue on larger database tables when users need to find some information by filtering not only by the meta_key and value but also sometimes by checking if that meta_key does not exist at all. This use case is covered in WP_Meta_Query, allowing for the use of compare => 'NOT EXISTS'

This solution should, however, be discouraged, as it generates very expensive queries that can really hammer a database performance.

Example:

/**
 * Using NOT_EXISTS in a Meta Query generates a LEFT JOIN clause
 * with an IS NULL check in WHERE
 *
 * Rows searched: 268,358 (all rows in the join)
 */
$query = new \WP_Query(
    array(
        'suppress_filters' => true,
        'meta_query'       => array(
            array(
                'key'     => '_publicize_pending',
                'compare' => 'NOT EXISTS',
            ),
        ),
    ) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  
    -> LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_key = 'some_key' ) 
    -> WHERE 1=1  
    -> AND ( wp_postmeta.post_id IS NULL ) 
    -> GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10;
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
| id | select_type | table       | partitions | type  | possible_keys                                                   | key     | key_len | ref                   | rows   | filtered | Extra                           |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | wp_posts    | NULL       | index | PRIMARY,post_name,type_status_date,post_parent,post_author,guid | PRIMARY | 8       | NULL                  | 268339 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | wp_postmeta | NULL       | ref   | post_id,meta_key,meta_key_meta_value                            | post_id | 8       | wordpress.wp_posts.ID |     19 |   100.00 | Using where; Not exists         |
+----+-------------+-------------+------------+-------+-----------------------------------------------------------------+---------+---------+-----------------------+--------+----------+---------------------------------+

In contrast, this query checking for a specific value in the declared meta_key only needs to read 89,970 rows, or roughly 70% fewer rows.

/**
 * Querying only for the meta_key and value generates an INNER JOIN clause
 * with a key/value match in WHERE
 *
 * Rows searched: 89,970
 */
$query = new \WP_Query(
    array(
        'suppress_filters' => true,
        'meta_query'       => array(
            array(
                'key'     => '_publicize_pending',
                'value'     => '1',
                'compare' => '!=',
            ),
        ),
    ) );

Generates:

MySQL [wordpress]> EXPLAIN SELECT wp_posts.ID
    -> FROM wp_posts
    -> INNER JOIN wp_postmeta
    -> ON ( wp_posts.ID = wp_postmeta.post_id )
    -> WHERE 1=1
    -> AND ( ( wp_postmeta.meta_key = '_publicize_pending'
    -> AND wp_postmeta.meta_value != '1' ) );
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
| id | select_type | table       | partitions | type   | possible_keys                        | key                 | key_len | ref                           | rows  | filtered | Extra       |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
|  1 | SIMPLE      | wp_postmeta | NULL       | range  | post_id,meta_key,meta_key_meta_value | meta_key_meta_value | 1534    | NULL                          | 89970 |   100.00 | Using where |
|  1 | SIMPLE      | wp_posts    | NULL       | eq_ref | PRIMARY                              | PRIMARY             | 8       | wordpress.wp_postmeta.post_id |     1 |   100.00 | Using index |
+----+-------------+-------------+------------+--------+--------------------------------------+---------------------+---------+-------------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.001 sec)
vishalkakadiya commented 3 years ago

@moraleida Hi 👋, you guys are doing awesome work with Engineering Practices! 🙌 Thank you for making these awesome docs.

I was recently checking this issue and would like to share my thoughts, not sure if this is a good idea for bigger picture but let me share:

$query = new \WP_Query(
    array(
        'suppress_filters' => true,
        'meta_key'         => '_publicize_pending_1',
    )
);

The above query will fetch all the posts which are pending for publicizing, and I guess will be somewhat faster than checking with the value. As we all know, in the normal WP environment index is available on meta_key only, not on the meta_value, however the WordPress VIP platform indexing meta_value in the combination of meta_key but in some limit of characters. So I guess we can think of this way as well for this issue, still correct me if I have interrupted in wrong discussions.

Thanks! 🙂

moraleida commented 3 years ago

@vishalkakadiya I think your solution should work fine. It taps into the same idea that is, always look for something that exists, instead of for something that doesn't. I suggest adding different values for a meta key, but having different meta-keys will most likely work just fine too.

I don't think indexing the meta_value column makes a lot of difference in this case, because the first thing the query will do is join the tables to contain only that meta_key. But then again, if we step into the millions of records within the join, then that might be a good replacement.