facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 712 forks source link

Write GTID sets, not just last executed GTID into WAL #543

Open yoshinorim opened 7 years ago

yoshinorim commented 7 years ago

This is a subtask for https://github.com/facebook/mysql-5.6/issues/474 . The purpose of this feature is support binlog_order_commits=0 in MyRocks.

Currently, MyRocks writes last executed binlog filename, position, and GTID into WAL (eventually persisted to system CF) at transaction commit. GTID is the last executed GTID, and it does not include GTID sets. If using binlog_order_commits=0, WAL write ordering is not guaranteed to be same as binlog ordering, so last executed binlog position/GTID may not include all executed transactions, so binlog pos/GTID from WAL can't be trusted anymore.

Some tools like myrocks_hotbackup and crash safe master rely on this feature. Both require to find the last executed binlog position (or GTID), then configure replication based on the position. So currently binlog_order_commits=1 is needed. A big side effect is it degrades performance a lot, as written in #474.

To make binlog_order_commits=0 work in MyRocks, it is necessary to write GTID sets, not just last executed GTIDs into WAL at commit. By doing so, on crash recovery, it will be possible to identify all executed transactions, and configure replication based on that. There are some technical uncertainties to implement this feature.

yoshinorim commented 7 years ago

Another important point is with ordering commits, if a transaction takes long time at commit() (e.g. writing large data into MemTable), all other transactions need to wait. This is a consequence of the serialized commits. This is a debugging code to reproduce the issue (https://gist.github.com/yoshinorim/68ba74e3acd5b7fc6629250072f79455). With disabling ordered commits, long time commit() doesn't block other transactions.