danielcheng007 / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 0 forks source link

consistency checking fails with mixed MySQL-MariaDB environment #458

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Create a master (MySQL) and two slaves (MySQL and MariaDB).
2. Insert the following data onto the master:

create database base;
CREATE TABLE user1 (id int(11) NOT NULL, name varchar(255) NOT NULL);
INSERT INTO base.user1 (id, name) VALUES (1, 'test1');
INSERT INTO base.user1 (id, name) VALUES (2, 'test2');
INSERT INTO base.user1 (id, name) VALUES (3, 'test3');
INSERT INTO base.user1 (id, name) VALUES (4, 'test4');

CREATE TABLE user2 (id int(11) NOT NULL, name varchar(255) NOT NULL);
INSERT INTO base.user2 (id, name) VALUES (1, 'test1');
INSERT INTO base.user2 (id, name) VALUES (2, 'test2');
INSERT INTO base.user2 (id, name) VALUES (4, 'test4');
INSERT INTO base.user2 (id, name) VALUES (3, 'test3');

3. On master:
./trepctl check base.user1 - OK
./trepctl check base.user2 - works on MySQL slave, fails on MariaDB slave

What is the expected output?

Successful consistency checking on both slaves if the data are the same.

What do you see instead?

Consistency checking fails on a MariaDB slave.

What version of the product are you using?

Tungsten 2.0.7-256

On what operating system?

Ubuntu / Gentoo

Please provide any additional information below.

Notes:

base.user1: OK
INFO   | jvm 1    | 2013/02/14 17:51:46 | 2013-02-14 17:51:46,753 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 SET @colcrc := MD5('id,name')
INFO   | jvm 1    | 2013/02/14 17:51:46 | 2013-02-14 17:51:46,755 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 SELECT COUNT(*) AS this_cnt, 
IFNULL(RIGHT(MAX(@crc := CONCAT(LPAD(@cnt := @cnt + 1, 16, '0'), 
MD5(CONCAT(@crc, MD5(CONCAT_WS(',',id,name)))))), 32), @colcrc) AS this_crc FRO
M (SELECT * FROM base.user1 ORDER BY id,name LOCK IN SHARE MODE) AS 
tungsten_consistency_check_tmp
INFO   | jvm 1    | 2013/02/14 17:51:46 | 2013-02-14 17:51:46,756 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 
org.drizzle.jdbc.DrizzleResultSet@5867df9
INFO   | jvm 1    | 2013/02/14 17:51:46 | 2013-02-14 17:51:46,756 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 Warnings: null
INFO   | jvm 1    | 2013/02/14 17:51:46 | 2013-02-14 17:51:46,757 [fish - 
q-to-dbms-0] INFO  applier.JdbcApplier Consistency check succeeded on table 
'base.user1' id: 3, offset: -1, limit: -1, method: 'md5' succeeded:
INFO   | jvm 1    | 2013/02/14 17:51:46 | this_cnt  : 4
INFO   | jvm 1    | 2013/02/14 17:51:46 | master_cnt: 4
INFO   | jvm 1    | 2013/02/14 17:51:46 | this_crc  : 
243eb5fca36c7f6db41134622a3c35d4
INFO   | jvm 1    | 2013/02/14 17:51:46 | master_crc: 
243eb5fca36c7f6db41134622a3c35d4

base.user2 (on MariaDB slave):
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,695 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 SET @colcrc := MD5('id,name')
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,695 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 SELECT COUNT(*) AS this_cnt, 
IFNULL(RIGHT(MAX(@crc := CONCAT(LPAD(@cnt := @cnt + 1, 16, '0'), 
MD5(CONCAT(@crc, MD5(CONCAT_WS(',',id,name)))))), 32), @colcrc) AS this_crc FRO
M (SELECT * FROM base.user2 ORDER BY id,name LOCK IN SHARE MODE) AS 
tungsten_consistency_check_tmp
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,696 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 
org.drizzle.jdbc.DrizzleResultSet@734b84c7
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,696 [fish - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 Warnings: null
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,698 [fish - 
pool-2-thread-1] ERROR management.OpenReplicatorManager Received error 
notification, shutting down services :
INFO   | jvm 1    | 2013/02/14 17:54:06 | Stage task failed: q-to-dbms
INFO   | jvm 1    | 2013/02/14 17:54:06 | 
com.continuent.tungsten.replicator.consistency.ConsistencyException: 
Consistency check failed on table 'base.user2' id: 5, offset: -1, limit: -1, 
method: 'md5' failed:
INFO   | jvm 1    | 2013/02/14 17:54:06 | this_cnt  : 4
INFO   | jvm 1    | 2013/02/14 17:54:06 | master_cnt: 4
INFO   | jvm 1    | 2013/02/14 17:54:06 | this_crc  : 
b7917b1dd5cabf4d867fe37cd5b42de2
INFO   | jvm 1    | 2013/02/14 17:54:06 | master_crc: 
243eb5fca36c7f6db41134622a3c35d4
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.consistencyCheck(JdbcAppl
ier.java:342)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:13
61)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.j
ava:101)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleTh
readStageTask.java:639)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:468)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:167)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
java.lang.Thread.run(Thread.java:662)
INFO   | jvm 1    | 2013/02/14 17:54:06 | 2013-02-14 17:54:06,698 [fish - 
q-to-dbms-0] INFO  pipeline.SingleThreadStageTask Unexpected error: Stage task 
failed: q-to-dbms
INFO   | jvm 1    | 2013/02/14 17:54:06 | 
com.continuent.tungsten.replicator.consistency.ConsistencyException: 
Consistency check failed on table 'base.user2' id: 5, offset: -1, limit: -1, 
method: 'md5' failed:
INFO   | jvm 1    | 2013/02/14 17:54:06 | this_cnt  : 4
INFO   | jvm 1    | 2013/02/14 17:54:06 | master_cnt: 4
INFO   | jvm 1    | 2013/02/14 17:54:06 | this_crc  : 
b7917b1dd5cabf4d867fe37cd5b42de2
INFO   | jvm 1    | 2013/02/14 17:54:06 | master_crc: 
243eb5fca36c7f6db41134622a3c35d4
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.consistencyCheck(JdbcAppl
ier.java:342)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:13
61)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.j
ava:101)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleTh
readStageTask.java:639)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:468)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:167)
INFO   | jvm 1    | 2013/02/14 17:54:06 |       at 
java.lang.Thread.run(Thread.java:662)

If we take the subquery:
SELECT CONCAT_WS(',',id,name) FROM (SELECT * FROM base.user1 ORDER BY id,name 
LOCK IN SHARE MODE) AS tungsten_consistency_check_tmp;

this gives:
+------------------------+
| CONCAT_WS(',',id,name) |
+------------------------+
| 1,test1                |
| 2,test2                |
| 3,test3                |
| 4,test4                |
+------------------------+
4 rows in set (0.00 sec)

For which the check ends corrently (on all nodes). However:
SELECT CONCAT_WS(',',id,name) FROM (SELECT * FROM base.user2 ORDER BY id,name 
LOCK IN SHARE MODE) AS tungsten_consistency_check_tmp;

On MySQL:
+------------------------+
| CONCAT_WS(',',id,name) |
+------------------------+
| 1,test1                |
| 2,test2                |
| 3,test3                |
| 4,test4                |
+------------------------+
4 rows in set (0.00 sec)

On MariaDB:
+------------------------+
| CONCAT_WS(',',id,name) |
+------------------------+
| 1,test1                |
| 2,test2                |
| 4,test4                |
| 3,test3                |
+------------------------+
4 rows in set (0.00 sec)

Original issue reported on code.google.com by tomas.mo...@gmail.com on 15 Feb 2013 at 8:32

GoogleCodeExporter commented 9 years ago
verified as described.
Either Tungsten has wrong assumption about the results, or there is a bug in 
MariaDB

Original comment by g.maxia on 15 Feb 2013 at 8:37

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r1212.

Original comment by stephane...@continuent.com on 22 Feb 2013 at 12:18

GoogleCodeExporter commented 9 years ago
Tomas,
Thanks for the detailed bug report.
It should now be fixed. Please let me know if that works for you.
Cheers
Stephane

Original comment by stephane...@continuent.com on 22 Feb 2013 at 12:19

GoogleCodeExporter commented 9 years ago
Thanks for answering Stephane. With these simple tables, it works fine.

However, it fails on 2 tables, while both tables have 'timestamp' types (maybe 
the problem lies somewhere else, but this was what I could narrow down to):

tbl1:
+------------+-----------+------+-----+-------------------+----------------+
| Field      | Type      | Null | Key | Default           | Extra          |
+------------+-----------+------+-----+-------------------+----------------+
| c1         | int(11)   | NO   | PRI | NULL              | auto_increment |
| c2         | int(11)   | NO   |     | NULL              |                |
| c3         | int(11)   | YES  |     | NULL              |                |
| c4         | char(1)   | NO   |     | A                 |                |
| c5         | timestamp | NO   |     | CURRENT_TIMESTAMP |                |
| c6         | timestamp | YES  |     | NULL              |                |
| c7         | int(11)   | YES  |     | 0                 |                |
+------------+-----------+------+-----+-------------------+----------------+

tbl2:
+------------+--------------+------+-----+-------------------+----------------+
| Field      | Type         | Null | Key | Default           | Extra          |
+------------+--------------+------+-----+-------------------+----------------+
| c1         | int(11)      | NO   | PRI | NULL              | auto_increment |
| c2         | int(11)      | NO   | MUL | NULL              |                |
| c3         | int(11)      | YES  |     | NULL              |                |
| c4         | char(1)      | NO   |     | A                 |                |
| c5         | varchar(750) | YES  |     | NULL              |                |
| c6         | timestamp    | NO   |     | CURRENT_TIMESTAMP |                |
| c7         | tinyint(1)   | YES  | MUL | NULL              |                |
+------------+--------------+------+-----+-------------------+----------------+

It fails on both, MySQL and MariaDB slaves. I have copied both tables to a 
testing machine, however I was only able to reproduce the problem randomly. 
Ever since I set:
wrapper.java.additional.5=-Dfile.encoding=UTF8
wrapper.java.additional.6=-Duser.timezone=GMT

I cannot reproduce in the test environment. It fails on production all the time.

Original comment by tomas.mo...@gmail.com on 25 Feb 2013 at 3:19

GoogleCodeExporter commented 9 years ago
Tomas,
Can you post the stack trace of the failure, as well as the show create table ?

Cheers
Stephane

Original comment by stephane...@continuent.com on 25 Feb 2013 at 3:39

GoogleCodeExporter commented 9 years ago
Also, do you see a difference in table contents ? Can you post the difference 
you see, if any ?

Original comment by stephane...@continuent.com on 25 Feb 2013 at 3:41

GoogleCodeExporter commented 9 years ago
Hello Stephane,
I was able to reproduce the problem in the test environment. I have a MySQL 
master and a MySQL + MariaDB slave. If all systems use UTC (/etc/localtime), 
the consistency checking runs fine. If however the system time is 
Europe/Bratislava (our case), the consistency check fails on both slaves. I'm 
trying to build up a minimal test case for verification.

Original comment by tomas.mo...@gmail.com on 26 Feb 2013 at 9:48

GoogleCodeExporter commented 9 years ago
Seems the problem I'm describing is related to timezone. I've posted the 
problem in the discussion:

https://groups.google.com/forum/?fromgroups=#!topic/tungsten-replicator-discuss/
-vL3OcvabxY

Original comment by tomas.mo...@gmail.com on 26 Feb 2013 at 3:36

GoogleCodeExporter commented 9 years ago
I have fixed the problem with timezones (by switching to row binlog), but the 
consistency check fails again. The data are the same, the dump shows the same 
data (except that some of the rows are in different order). I have setup 
debugging on the consistency check and tried the query:

SELECT CONCAT_WS(columns) AS this_crc FROM (SELECT * FROM db.table LOCK IN 
SHARE MODE) AS tungsten_consistency_check_tmp ORDER BY `PK`;

It gives the same data on master and the slave.

INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,994 [instance - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 
org.drizzle.jdbc.DrizzleResultSet@f964ca1
INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,994 [instance - 
q-to-dbms-0] DEBUG consistency.ConsistencyCheckMD5 Warnings: null
INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,996 [instance - 
pool-2-thread-1] ERROR management.OpenReplicatorManager Received error 
notification, shutting down services :
INFO   | jvm 1    | 2013/03/04 08:47:25 | Stage task failed: q-to-dbms
INFO   | jvm 1    | 2013/03/04 08:47:25 | 
com.continuent.tungsten.replicator.consistency.ConsistencyException: 
Consistency check failed on table 'db.table' id: 12, offset: -1, limit: -1, 
method: 'md5' failed:
INFO   | jvm 1    | 2013/03/04 08:47:25 | this_cnt  : 287130
INFO   | jvm 1    | 2013/03/04 08:47:25 | master_cnt: 287130
INFO   | jvm 1    | 2013/03/04 08:47:25 | this_crc  : 
2cc639961e0e976603c9cecd79d39f30
INFO   | jvm 1    | 2013/03/04 08:47:25 | master_crc: 
40cf081ed5edbb1aad826379579d6520
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.consistencyCheck(JdbcAppl
ier.java:342)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:13
61)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.j
ava:101)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleTh
readStageTask.java:639)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:468)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:167)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
java.lang.Thread.run(Thread.java:679)
INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,998 [instance - 
pool-2-thread-1] WARN  management.OpenReplicatorManager Performing emergency 
service shutdown
INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,998 [instance - 
pool-2-thread-1] INFO  pipeline.Pipeline Shutting down pipeline: slave
INFO   | jvm 1    | 2013/03/04 08:47:25 | 2013-03-04 08:47:24,998 [instance - 
q-to-dbms-0] INFO  pipeline.SingleThreadStageTask Unexpected error: Stage task 
failed: q-to-dbms
INFO   | jvm 1    | 2013/03/04 08:47:25 | 
com.continuent.tungsten.replicator.consistency.ConsistencyException: 
Consistency check failed on table 'db.table' id: 12, offset: -1, limit: -1, 
method: 'md5' failed:
INFO   | jvm 1    | 2013/03/04 08:47:25 | this_cnt  : 287130
INFO   | jvm 1    | 2013/03/04 08:47:25 | master_cnt: 287130
INFO   | jvm 1    | 2013/03/04 08:47:25 | this_crc  : 
2cc639961e0e976603c9cecd79d39f30
INFO   | jvm 1    | 2013/03/04 08:47:25 | master_crc: 
40cf081ed5edbb1aad826379579d6520
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.consistencyCheck(JdbcAppl
ier.java:342)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:13
61)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.j
ava:101)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleTh
readStageTask.java:639)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:468)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:167)
INFO   | jvm 1    | 2013/03/04 08:47:25 |       at 
java.lang.Thread.run(Thread.java:679)

Original comment by tomas.mo...@gmail.com on 4 Mar 2013 at 8:45

GoogleCodeExporter commented 9 years ago
The checking was done on Tungsten Replicator 2.0.8 build 20.

Original comment by tomas.mo...@gmail.com on 4 Mar 2013 at 8:47

GoogleCodeExporter commented 9 years ago
Re opening since it seems it can still fail under other conditions.

Original comment by stephane...@continuent.com on 19 Mar 2013 at 1:39