github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.43k stars 1.26k forks source link

Delete records not in sync between table and gho table #1142

Open daphid4 opened 2 years ago

daphid4 commented 2 years ago

We recently had an issue where we discovered that a gh-ost migration that was run on a table wasn't syncing DELETEs between the table being migrated from and the new table. I was able to replicate the issue in our staging environment, and it was very clear that the two tables were out of sync while the migration was being run.

Is there anything we're missing on our end? Did we misunderstand that gh-ost should be keeping those tables in sync during a migration?

./gh-ost \
--user="user" \
--password="pass" \
--host=host \
--database="db" \
--table="ourTable" \
--allow-on-master \
--verbose \
--postpone-cut-over-flag-file="./postpone-cut.txt" \
--discard-foreign-keys \
--alter="PARTITION BY HASH(parkId) PARTITIONS 101" \
--initially-drop-ghost-table \
--timestamp-old-table \
--execute

This is the command that we ran. We use RDS with bin_log format = ROW, if that helps

daphid4 commented 2 years ago

Adding a little more context to this - I was able to identify that these cases were happening from a rollback on the main table and that event sticking around in the gho table.

Looking at the cheat sheet, b. I would expect to be able to run against master - but is that an incorrect assumption for rollback events?

timvaillancourt commented 2 years ago

@hunthunt2010 I'm not sure I follow the issue entirely but deleted records should remain in sync and I've never heard of that not being the case 🤔

I was able to identify that these cases were happening from a rollback on the main table

Can you elaborate on what you mean here? Are you perhaps doing a DELETE within a transaction and running ROLLBACK? Only transactions that receive a COMMIT are written to the binary log and thus copied to the _gho table, so maybe that is the problem

daphid4 commented 2 years ago

@timvaillancourt thanks for the reply

that is correct, we are running a DELETE within a transaction, and then a ROLLBACK occurs.

Here is a sample of what the binlog looks like in this case.

image

Since I've filed this issue, I have gotten some further context of the binlog - and how another product utilizes it. Debezium is a tool for change data streaming, and they have a feature implemented that looks forward in the binlog to detect this scenario.

The best we can tell, it comes into play when temp tables are used

Does gh-ost support something similar?

https://debezium.io/documentation/reference/1.9/connectors/mysql.html#mysql-property-binlog-buffer-size image (13)

daphid4 commented 2 years ago

https://forums.mysql.com/read.php?26,661824,661824 - this was never answered it looks like, but pretty much the same issue we see