go-mysql-org / go-mysql

a powerful mysql toolset with Go
MIT License
4.58k stars 976 forks source link

separate serverID of Mariadb GTID set #852

Closed okJiang closed 6 months ago

okJiang commented 6 months ago

ref https://github.com/pingcap/tiflow/issues/10741

lance6716 commented 6 months ago

And you add some tests and comments to explain what's the new behaviour?

okJiang commented 6 months ago

And you add some tests and comments to explain what's the new behaviour?

I fixed some existed test case just now https://github.com/go-mysql-org/go-mysql/pull/852/commits/494cb565c3559461401f3042e6b53634559ccca3

We can compare the behavior before and after by those examples.

Before

  1. 1-2-1 equals to 1-3-1. They have the same domain id but different server id
  2. 1-2-3 is grater than 1-3-1. Because their domain id is equal and sequence number is different.

Suppose there is a scenario where the user uses the same domain id and sets gtid_strict_mode=OFF. In this case, deploy multi-source synchronization of A->C and B->C. The domain ids of A and B are the same, and we cannot compare the gtid from A and B on the C machine. We can only isolate them by server id

lance6716 commented 6 months ago

@okJiang i don't understand why this library needs to care about server ID. In MariaDB's source code, server ID is simply ignored when comparing https://github.com/MariaDB/server/blob/9e7afa7782314e9c5a3b3276963110d18287b783/sql/rpl_gtid.h#L54-L64 .

Before

  1. 1-2-1 equals to 1-3-1. They have the same domain id but different server id
  2. 1-2-3 is grater than 1-3-1. Because their domain id is equal and sequence number is different.

And in your example, no matter before or after, 1-2-1 should be a bug if there's 1-3-1, and 1-2-3 is always greater than 1-3-1

okJiang commented 6 months ago

And in your example, no matter before or after, 1-2-1 should be a bug if there's 1-3-1

Is there such a scenario?

A master-slave replication: server2(master) -> server3(slave).

  1. Server3 execute a SQL, whose gtid is 1-3-1.
  2. As for server2(master), it is not aware of the slave's operation. So server2(master) execute a SQL too, whose gtid is 1-2-1
  3. Due to the gtid_strict_mode disable, 1-2-1 is synced from server2 to server3. So the 1-3-1, 1-2-1 exist in server3 at the same time.
lance6716 commented 6 months ago

I see your reference in tiflow's issue http://youdidwhatwithtsql.com/behavior-gtidstrictmode-mariadb/2089/ . There can exist 1-1-2152 and 1-2-2152 in that example.

https://mariadb.com/kb/en/gtid/#gtid_strict_mode says "Global transaction ID is designed to work correctly even when strict mode is not enabled.", so I thinnk we can learn MariaDB's behaviour on what should replication do when meets an out-of-order event.

okJiang commented 6 months ago

Sorry I'm not familiar enough with the source code of mariadb. Seems it reports some warnings and ignore it. https://github.com/MariaDB/server/blob/9e7afa7782314e9c5a3b3276963110d18287b783/sql/rpl_gtid.cc#L3479-L3508

okJiang commented 6 months ago

Sorry I'm not familiar enough with the source code of mariadb. Seems it reports some warnings and ignore it. https://github.com/MariaDB/server/blob/9e7afa7782314e9c5a3b3276963110d18287b783/sql/rpl_gtid.cc#L3479-L3508

If this is the case, there may be unanticipated behavior to our product because DM exists heavy reliance on binlog order. Ref https://github.com/pingcap/tiflow/issues/10741#issuecomment-1990948130 .

So I hope to find a moderate way to solve this problem. Of course, I don’t want to break the rules and compatibility of mariadb.

okJiang commented 6 months ago

cc @lance6716

okJiang commented 6 months ago

ping @lance6716