facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.49k stars 712 forks source link

Wrong query plan for tpcc stock_level #1268

Open rockeet opened 1 year ago

rockeet commented 1 year ago

Branch: fb-mysql-8.0.28

Version: gitsha c75bf30d86a214a3a7106e5df0df47a130306c2f

This bug impact both innodb and myrocks, an version of 6 months ago has no this issue.

Reproduce

Reproduce is very quickly, just prepair tpcc data with warehouses=5, loadWorkers=5, terminals=5, then:

mysql> explain SELECT ol_i_id, d_next_o_id FROM bmsql_order_line JOIN bmsql_district ON ol_w_id = d_w_id AND ol_d_id = d_id AND ol_o_id between d_next_o_id - 20 AND d_next_o_id WHERE d_w_id = 1 AND d_id = 5;
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+--------+----------+-------------+
| id | select_type | table            | partitions | type  | possible_keys | key     | key_len | ref         | rows   | filtered | Extra       |
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+--------+----------+-------------+
|  1 | SIMPLE      | bmsql_district   | p0         | const | PRIMARY       | PRIMARY | 8       | const,const |      1 |   100.00 | NULL        |
|  1 | SIMPLE      | bmsql_order_line | p0         | ref   | PRIMARY       | PRIMARY | 8       | const,const | 120517 |    11.11 | Using where |
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+--------+----------+-------------+

The rows column presented this query plain is wrong, it use range scan instead of const match. If we rewrite this sql to:

SELECT ol_i_id, d_next_o_id FROM bmsql_order_line JOIN bmsql_district
      ON ol_w_id = d_w_id AND ol_d_id = d_id AND ol_o_id in
        (2980,2981,2982,2983,2984,2985,2986,2987,2988,2990,2991,2992,2993,2994,2995,2996,2997,2998,2999,3000,3001)
      WHERE d_w_id = 1 AND d_id = 5;

The result is correct:

 explain SELECT ol_i_id FROM bmsql_order_line JOIN bmsql_district ON ol_w_id = d_w_id AND ol_d_id = d_id AND ol_o_id in (2980,2981,2982,2983,2984,2985,2986,2987,2988,2990,2991,2992,2993,2994,2995,2996,2997,2998,2999,3000,3001) WHERE d_w_id = 1 AND d_id = 5;
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+-------+----------+-------------+
| id | select_type | table            | partitions | type  | possible_keys | key     | key_len | ref         | rows  | filtered | Extra       |
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+-------+----------+-------------+
|  1 | SIMPLE      | bmsql_district   | p0         | const | PRIMARY       | PRIMARY | 8       | const,const |     1 |   100.00 | Using index |
|  1 | SIMPLE      | bmsql_order_line | p0         | range | PRIMARY       | PRIMARY | 12      | NULL        | 13482 |   100.00 | Using where |
+----+-------------+------------------+------------+-------+---------------+---------+---------+-------------+-------+----------+-------------+
laurynas-biveinis commented 1 year ago

Both EXPLAIN outputs seem identical, any chance this is the same thing pasted twice?

rockeet commented 1 year ago

Both EXPLAIN outputs seem identical, any chance this is the same thing pasted twice?

It's my fault, I pasted wrong text for the second one.

Now I have re-run the two explain on a larger data set and updated the above result -- the second explain result is ok, the query plan using key_len 12 and estimated rows is 13482. the first one is wrong, it using key_len 8, which causing slow index range scan.

Also I have reproduced this issue on upstream(Oracle) mysql-8.0.30, it seems this issue was inroduced in some recent revision by upstream mysql. So I worked around this issue by rewrite the relavant tpcc subquery to:

explain SELECT ol_i_id FROM bmsql_order_line WHERE (ol_w_id, ol_d_id, ol_o_id) IN
 (WITH RECURSIVE nums AS (SELECT 1 AS value UNION ALL SELECT value + 1 AS value FROM nums WHERE nums.value < 20) 
   SELECT d_w_id, d_id, d_next_o_id - value
    FROM nums cross join bmsql_district where d_w_id = 1 AND d_id = 5);

The new query force query plan to use the full 3 columns of the index(key_len = 12):

+----+-------------+------------------+------------+-------+---------------+---------+---------+------------------+------+----------+----------------------------+
| id | select_type | table            | partitions | type  | possible_keys | key     | key_len | ref              | rows | filtered | Extra                      |
+----+-------------+------------------+------------+-------+---------------+---------+---------+------------------+------+----------+----------------------------+
|  1 | PRIMARY     | bmsql_district   | p0         | const | PRIMARY       | PRIMARY | 8       | const,const      |    1 |   100.00 | NULL                       |
|  1 | PRIMARY     | <derived3>       | NULL       | ALL   | NULL          | NULL    | NULL    | NULL             |    3 |   100.00 | Start temporary            |
|  1 | PRIMARY     | bmsql_order_line | p0         | ref   | PRIMARY       | PRIMARY | 12      | const,const,func |   10 |   100.00 | Using where; End temporary |
|  3 | DERIVED     | NULL             | NULL       | NULL  | NULL          | NULL    | NULL    | NULL             | NULL |     NULL | No tables used             |
|  4 | UNION       | nums             | NULL       | ALL   | NULL          | NULL    | NULL    | NULL             |    2 |    50.00 | Recursive; Using where     |
+----+-------------+------------------+------------+-------+---------------+---------+---------+------------------+------+----------+----------------------------+

Since this is an upstream issue, if myrocks team have no interest on tracking this issue, just close it.