Simplify the chunk range condition generate logic, reduce the risk of using wrong execute plan(enhancement)

ruiaylin commented 4 years ago

We meet a performance case when use gh-ost to do ddl

Problem Description

When we do a ddl on a table which has a composed primary key, active session arise quickly and the database hang。 But when we do the same ddl operation using pt-online-schema-change， it worked。 It is very confusing。 So we want to do a deep analysis to get the reason。

The reason We got

when we compare the chunk sql of pt-osc and gh-ost . The Range condition has some different like below： The table's primary key compose with three columns : shard_key, bt_id, object for the same chunk start : (973,50,.trash/9d16-4b95-bb50-xxxxxxxxxx155 ) end : (973,50,.trash/9d16-4b95-bb50-xxxxxxxxxx2ss )

the range condition from gh-ost

WHERE (`shard_key` > _binary '973'
        OR (`shard_key` = _binary '973'
            AND `bt_id` > _binary '50')
        OR (`shard_key` = _binary '973'
            AND `bt_id` = _binary '50'
            AND `object` > _binary '.trash/9d16-4b95-bb50-xxxxxxxxxx155'))
    AND (`shard_key` < _binary '973'
        OR (`shard_key` = _binary '973'
            AND `bt_id` < _binary '50')
        OR (`shard_key` = _binary '973'
            AND `bt_id` = _binary '50'
            AND `object` < _binary '.trash/9d16-4b95-bb50-xxxxxxxxxx155')
        OR (`shard_key` = _binary '973'
            AND `bt_id` = _binary '50'
            AND `object` = _binary '.trash/9d16-4b95-bb50-xxxxxxxxxx2ss'))

The range condition from pt-online-schema-change

WHERE (`shard_key` > '973'
        OR (`shard_key` = '973'
            AND `bt_id` > '50')
        OR (`shard_key` = '973'
            AND `bt_id` = '50'
            AND `object` >= '.trash/9d16-4b95-bb50-xxxxxxxxxx155'))
    AND (`shard_key` < '973'
        OR (`shard_key` = '973'
            AND `bt_id` < '50')
        OR (`shard_key` = '973'
            AND `bt_id` = '50'
            AND `object` <= '.trash/9d16-4b95-bb50-xxxxxxxxxx2ss'))

We Can see that the condition of gh-ost is more complex than pt-osc. when we check the sql execution plan, they are all using primary key . pt-osc'sql scan only 499 rows , but gh-ost scan 20 millions rows and it take 30 seconds to accomplish .

We think the query condition is more complex, the database optimizer has more risk to using a wrong plan .

Optimization

We change the chunk query condition generate logic to make gh-ost generate a chunk sql query like pt-osc does . Making the condition more simple and readable , We also do more test about data consistency, It worked well now . So we want give this pr to gh-ost .

Thank you!

ruiaylin commented 4 years ago

We want make a pr to gh-ost

druud commented 4 years ago

Be aware that the operators like '>=' also support tuples, so you can code (c1,c2,c3) >= (v1,v2,v3).

I generally prefer to use ranges with a closed start and an open end, so I (virtually always) pair '>=' with '<'.

shlomi-noach commented 4 years ago

so you can code (c1,c2,c3) >= (v1,v2,v3).

@druud unfortunately I believe this still applies: http://code.openark.org/blog/mysql/mysql-not-being-able-to-utilize-a-compound-index, and this is the reason for exploding the range comparison query.

druud commented 4 years ago

so you can code (c1,c2,c3) >= (v1,v2,v3).

@druud unfortunately I believe this still applies: http://code.openark.org/blog/mysql/mysql-not-being-able-to-utilize-a-compound-index, and this is the reason for exploding the range comparison query.

Right, so instead of just expanding the syntactic sugar and then let the optimizer work that out as usual, there has been an optimization effort that derailed. Can't say I haven't seen that happen before, and I might have once or twice caused a few cases myself too.

Now I want more than ever a user-configurable very complete server-side SQL-rewrite layer, that can also use table structures and column data types etc. even more than ever before. (I recently hit a case where a JOIN unnecessarily derailed because the data types of the ON-columns weren't exactly the same.)

ruiaylin commented 3 years ago

so you can code (c1,c2,c3) >= (v1,v2,v3).

@druud unfortunately I believe this still applies: http://code.openark.org/blog/mysql/mysql-not-being-able-to-utilize-a-compound-index, and this is the reason for exploding the range comparison query.

Right, so instead of just expanding the syntactic sugar and then let the optimizer work that out as usual, there has been an optimization effort that derailed. Can't say I haven't seen that happen before, and I might have once or twice caused a few cases myself too.

Now I want more than ever a user-configurable very complete server-side SQL-rewrite layer, that can also use table structures and column data types etc. even more than ever before. (I recently hit a case where a JOIN unnecessarily derailed because the data types of the ON-columns weren't exactly the same.)

We tested the case like vector (c1,c2,c3) >= (v1,v2,v3)， It has the wrong plan too.

github / gh-ost