Open aadant opened 7 years ago
@shlomi-noach : I think this should address https://www.percona.com/blog/2017/07/12/gh-ost-benchmark-against-pt-online-schema-change-performance/
@aadant thank you for this suggestion!
A few thoughts:
INSERT
or multiple REPLACE
statement.dml-batch-size > 1
; requires some app logic changesWith regard https://www.percona.com/blog/2017/07/12/gh-ost-benchmark-against-pt-online-schema-change-performance/ it's worth noting that the test was executed all-on-the-same-box ; single master node with gh-ost
running from same box. So network is unlikely to be a big issue.
I'd still be very curious to see the impact of non-prepared statements on this benchmark; but first a few days of testing are in order.
I noticed that gh-ost runtime is sensitive to its physical location. For example, if you use --test-on-replica, the gh-ost executable needs to be located on the slave for best performance. On a WAN, it can make a 1 to 100 difference. It can be the reason gh-ost will never catch up and the migration will last forever.
The reason is simple :
when looking at the general log, there are a lot of repeated statements like this on the same table :
A single event created 3 network round trips. This can obviously be optimized.
If 2 consecutive events are hitting the same table and are of the same kind (insert, update), then they can be added at the end of the replace, up to a certain number in mysqldump fashion. The most efficient would be not to use prepared statement (that could be an option) and just generate a large statement ( 1 network round trip) or execute several statements per prepare if the next event is compatible ...
Using bulk statements or repeated execute would tremendously improve performance if the table is being written to when doing the schema change.