What steps will reproduce the problem?
1. Configure master backups using a hot backup method like file system
snapshots or a transactional dump.
2. Run a backup on a master that is under load.
3. Restore the backup on a slave and try to connect to the master.
What is the expected output?
Slave should go online normally and resume replication from the point where the
master was at the time of backup.
What do you see instead?
Unfortunately it often turns out that the master has a few extra transactions
that have not been extracted and recorded into the trep_commit_seqno table
position. The master is actually a few transactions ahead so when you connect
you may get duplicate key errors as the slave repeats the extra transactions.
We currently end up having to manually look up the binlog position in restored
slave (for example by reading the InnoDB log as MySQL comes up) and then
manually adjust trep_commit_seqno.
What is the possible cause?
The root cause is the latency between when the DBMS commits and when we
actually read the log.
What is the proposed solution?
This is an interesting question. It might make sense *not* to use the
trep_commit_seqno table as it is out of date and there are availability
problems if we have to write to it constantly. For example, we can't live
through a master reboot, which seems a little weak for widespread use in large
sites. Instead, we should mark the current native transaction ID (e.g., the
binlog position in MySQL) and use that.
Additional information
...
Use labels and text to provide additional information.
Original issue reported on code.google.com by berkeley...@gmail.com on 8 Sep 2011 at 6:07
Original issue reported on code.google.com by
berkeley...@gmail.com
on 8 Sep 2011 at 6:07