facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.5k stars 715 forks source link

Support transactional DDL in MyRocks #609

Open yoshinorim opened 7 years ago

yoshinorim commented 7 years ago

This task is for supporting transactional DDL in MyRocks. MySQL 8.0 removes frm and stores all table metadata in data dictionary. Currently MyRocks stores very limited number of table metadata in dictionary (table names and number of indexes for each table only -- https://github.com/facebook/mysql-5.6/wiki/MyRocks-data-dictionary-format). Storing all table metadata into MyRocks data dictionary makes transactional DDL operation possible in MySQL 8.0. It also gives an extra safety check feature in 5.6. Here is an example case where MySQL can silently cause data corruption in 5.x.

create table t (id1 int, id2 int, type int, value int, primary key (id1, id2, type)) engine=rocksdb;
insert into t values (1, 100, 300, 0);
stop mysqld
replace t.frm with a different t.frm created by different table definition like "create table t (id1 int, id2 int, type int, value int, primary key (type, id1, id2)) engine=rocksdb"
start mysqld
mysql> select * from t;
+-----+-----+------+-------+
| id1 | id2 | type | value |
+-----+-----+------+-------+
| 100 | 300 |    1 |     0 |
+-----+-----+------+-------+
1 row in set (0.00 sec)
--> type and id1 are different from what they were stored initially.

If transactional DDL is implemented, this kind of corruption can be prevented, even with 5.x. At startup, compare internal data dictionary and frm format, and refuse to start if they do not match. Currently we do basic check (checking the number of indexes for each table) supported by https://github.com/facebook/mysql-5.6/commit/ec717d138bbae6ad987d8f3b85670915a65218a0

Full table metadata example can be checked by installing MySQL 8.0.1 DMR, creating MyISAM test table and viewing an SDI file in JSON format. Here is an example linkbench (linktable) definition -- https://gist.github.com/yoshinorim/b1359d6ebe55bf71139911005c1e14c9 (note that the JSON is a view of the dictionary, so it's not normalized, while dictionary format itself should be normalized)

george-lorch commented 7 years ago

Please be careful with this idea as the metadata (.frm) can and will be different across variants, possibly resulting in the inability to easily switch. TokuDB implemented something similar years ago and this resulted in inability to move TokuDB tables across from MySQL to Percona Server to MariaDB.

Thinking a little further, since MyRocks uses mem-comparable key formatting based on sql/field.cc functionality, variants may already be incompatible from the outset if they do not maintain the exact same server side implementations of Field::make_sort_key methods. So my concern about transportability might dead from the start.

lth commented 7 years ago

For Field::make_sort_key, a possible fix is to copy those functions from the SQL layer to the SE layer. That way, you can ensure that we are creating the sort key consistently across different servers.

george-lorch commented 7 years ago

Tangent : I had considered making a patch exactly as you describe and suggesting it. What we are worried about with this make_sort_key is that we unsuspectingly merge in an upstream MySQL change to one of these methods which then breaks MyRocks key pack/unpack functionality through an upgrade. Upstream has had no significant changes to this code in 10+ years, but that does not mean it will never happen. We are considering 'poisoning' this code in Percona Server in such a way that any upstream MySQL change would result in a merge failure and alert us to the potential issue. Keeping this behavior consistent across variants and versions makes it easier for end users to move from one to another with no need for logical dump/loads.

alxyang commented 7 years ago

After discussing with @yoshinorim and @hermanlee about the work needed to support transactional DDL in 5.6, we agreed that due to the scope of work we will put this on hold until we are ready to move to 8.0.