github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.43k stars 1.26k forks source link

A data-loss case occurs when chunk-key contains decimal-column #1119

Open wangzhanbing opened 2 years ago

wangzhanbing commented 2 years ago

Hello:

We tried to recycle data-free space by alter table tab engine = innodb from a table. After gh-ost executed, no error coccurs, but data is lost:

gh-ost version is 1.1.2
mysql version is 5.6

gh-ost cmd is:

./gh-ost  \
--host=${IP} --port=${Port} --user=${user} --password=${password} --database=${db}--table=NewInstance \
--alter=engine=innodb \
--switch-to-rbr \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1000 \
--heartbeat-interval-millis=2000 \
--throttle-control-replicas=${IP_list} \
--cut-over=default --exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--dml-batch-size=10 \
--nice-ratio=0 \
--serve-socket-file=./tmp/gh-ost.sock \
--throttle-flag-file=./tmp/gh-ost.pause.flag \
--ok-to-drop-table \
--cut-over-exponential-backoff=true \
--exponential-backoff-max-interval=1024 \
--panic-flag-file=./tmp/gh-ost.panic.flag \
--postpone-cut-over-flag-file=flag \
--verbose \
--execute \

the create- table info is:

CREATE TABLE `NewInstance` (
  `InstanceID` decimal(32,0) NOT NULL DEFAULT '0' ,
  `InstanceStatus` tinyint(4) NOT NULL,
  `TriggerTime` datetime NOT NULL,
  `TriggerDate` date NOT NULL,
  `BeginTime` datetime DEFAULT NULL,
  `EndTime` datetime DEFAULT NULL,
  `LastRetryTime` datetime DEFAULT NULL,
  `Retry` int(11) DEFAULT NULL,
  `Parameters` text,
  `RunParameters` text,
  `Status` tinyint(4) NOT NULL,
  `TaskID` bigint(20) NOT NULL,
  `job_id` varchar(100) NOT NULL DEFAULT '0',
  PRIMARY KEY (`InstanceID`),
  KEY `NewInstace_TaskID` (`TaskID`),
  KEY `idx_InstanceStatus_Status` (`InstanceStatus`,`Status`),
  KEY `idx_job_id` (`job_id`),
  KEY `idx_TriggerTime` (`TriggerTime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# before alter
> select count(1) from NewInstance;
+----------+
| count(1) |
+----------+
|  6367309 |
+----------+
1 row in set (0.76 sec)

# after alter
> select count(1) from NewInstance;
+----------+
| count(1) |
+----------+
|  6352445 |
+----------+
1 row in set (0.82 sec)

The cause has been located and fixed !

It's because the unique-key contains decimal-type-column, reference: 链接

the query-sql and insert-sql of row-copy sql is like:

select  /* gh-ost `test_routine`.`NewInstance` iteration:6347 */
        `InstanceID`
    from
         `test_routine`.`NewInstance`
     where ((`InstanceID` > _binary'2708201202204012300000012')) and ((`InstanceID` < _binary'257980120220318000000004000') or ((`InstanceID` = _binary'257980120220318000000004000')))
order by `InstanceID` asc
    limit 1
    offset 999;

insert /* gh-ost `test_routine`.`NewInstance` */ ignore into `test_routine`.`_NewInstance_gho` (`InstanceID`, `InstanceStatus`, `TriggerTim
e`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id`)
      (select `InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskI
D`, `job_id` from `test_routine`.`NewInstance` force index (`PRIMARY`)
        where (((`InstanceID` > _binary'2708201202204012300000012')) and ((`InstanceID` < _binary'2708201202204062200000015') or ((`InstanceID` = _binary'2708201202204062200
000015')))) lock in share mode
      )

one of lost data records is primary-key InstanceID = _binary'80700020220314193000' It should be contained in row-copy sql like this, BUT not

insert /* gh-ost `test_routine`.`NewInstance` */ ignore into `test_routine`.`_NewInstance1_gho` (`InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id`)
      (select `InstanceID`, `InstanceStatus`, `TriggerTime`, `TriggerDate`, `BeginTime`, `EndTime`, `LastRetryTime`, `Retry`, `Parameters`, `RunParameters`, `Status`, `TaskID`, `job_id` from `test_routine`.`NewInstance` force index (`PRIMARY`)
        where (((`InstanceID` > _binary'80700020220314190000')) and ((`InstanceID` < _binary'80700020220405100000') or ((`InstanceID` = _binary'80700020220405100000')))) lock in share mode
      )

you can reproduce this case simplely by:

> CREATE TABLE `test_decimal` (
`id` decimal(32,0) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
> insert into test_decimal values('80700020220314193000');

# expect: 1 row, got 0 row
> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));

one of the right solution is adding cast as follows:

# expect: 1 row, got 1 row -- perfect
select count(1) from test_decimal where (((`id` > cast(_binary'80700020220314190000' as decimal(32)))) and ((`id` < cast(_binary'80700020220405100000' as decimal(32,0))) or ((`id` = cast(_binary'80700020220405100000' as decimal(32,0))))));

we have fixed this bug and try to contribute patch gh-ost community.

Thank you! From wangzhanbing Baidu

RainbowDashy commented 2 years ago

Thanks for your detailed report! I tried to reproduce using MySQL 5.6, 5.7, and 8.0 and I'd like to share the results. It seems that this is only a problem in 5.6.

mysql> CREATE TABLE `test_decimal` (
    -> `id` decimal(32,0) NOT NULL DEFAULT '0',
    -> PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.01 sec)

mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 5.6.51    |
+-----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE `test_decimal` (
    -> `id` decimal(32,0) NOT NULL DEFAULT '0',
    -> PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.03 sec)

mysql>  insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.00 sec)

mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 5.7.23    |
+-----------+
1 row in set (0.00 sec)
mysql> CREATE TABLE `test_decimal` (
    -> `id` decimal(32,0) NOT NULL DEFAULT '0',
    -> PRIMARY KEY (`id`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> insert into test_decimal values('80700020220314193000');
Query OK, 1 row affected (0.00 sec)

mysql> select count(1) from test_decimal where (((`id` > _binary'80700020220314190000')) and ((`id` < _binary'80700020220405100000') or ((`id` = _binary'80700020220405100000'))));
+----------+
| count(1) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.27    |
+-----------+
1 row in set (0.00 sec)