Shopify / ghostferry

The swiss army knife of live data migrations
https://shopify.github.io/ghostferry
MIT License
693 stars 65 forks source link

Ghostferry misses data for PK values of <= 0 #305

Open shuhaowu opened 2 years ago

shuhaowu commented 2 years ago

If a table t has a single column id bigint(20) (also the PK) with records -1, 0, 1, 2, 3. Ghostferry will silently not copy data where id <= 0. There are two related causes:

  1. For the negative number case, this is because Ghostferry assumes that the PK is an uint64.
  2. For the 0 case, the DefaultBuildSelect selects rows where PK > 0 in the first iteration, which misses the record for 0.

This only happens if Ghostferry is started and these records already exists in the database as only the DataIterator misses these records. The BinlogStreamer will stream these records if they are inserted. This also means the InlineVerifier will miss these records as well and will not emit an error unless these records are also UPDATEd during the course of the transfer. When the record is updated, since it doesn't exist on the target, the BinlogWriter won't be able to update it (as the previous state on the target database doesn't match what the binlog event suggests). This means the row will be added to a reverify queue. The subsequent fingerprinting attempt will always yield different result from the source and the target. This means that there is a chance for the InlineVerifier to catch this (which is how we found it), but it is not guaranteed.

The first case is hard to fix. Perhaps we should detect such record exists and raise an error during table loading. For the 0 case, we need to review the BuildSelect functions to make sure that 0 can be included.

shuhaowu commented 2 years ago

The zero case is also difficult to fix, especially if zero is the only entry in the table.

The best way to solve this is to use a generic sortable type to represent the PK. Not sure if Golang has easy support for this due to the lack of generics.

dikang123 commented 2 years ago

gh-ost do not have this issue, so I think this can be fixed with golang.

shuhaowu commented 2 years ago

Ghost doesn't assume an uint64 and has a UniqueKey data type. It also calculates the iteration ranges by querying MySQL, thus relying on MySQL to sort the values. This is the best solution, as it would also open the door to allow for non-integer pagination keys. However, it would require significant changes within Ghostferry that is somewhat non-trivial for the moment.

dikang123 commented 2 years ago

Ghost doesn't assume an uint64 and has a UniqueKey data type. It also calculates the iteration ranges by querying MySQL, thus relying on MySQL to sort the values. This is the best solution, as it would also open the door to allow for non-integer pagination keys. However, it would require significant changes within Ghostferry that is somewhat non-trivial for the moment.

agree