cashapp / spirit

Online Schema Change Tool for MySQL 8.0+
Apache License 2.0
113 stars 20 forks source link

Support checksum resume and checksum periodic yield #317

Closed morgo closed 2 months ago

morgo commented 5 months ago

Currently if there is a checksum failure, or the process is killed during checksum it requires a complete restart. Since checksums typically take about 10% of copier time, we might be talking about a day of progress lost.

What might be a worse issue is that the checksum uses a repeatable read transaction, which could be a day old. This will block the purge process, which could be a real issue.

I propose that instead of using a single checksum which consists of:

.. we instead repeat this process every N hours. This has the regrettable downside that there will be more locks, but for some systems 24hr+ long running transactions can bring everything to a grinding halt.