What steps will reproduce the problem?
1. Set up master slave replication.
2. Add a filter that looks like the following to drop 3/4 of the transactions
generated on the master.
/** Filter filter34.js */
function filter(event)
{
// Drop 3/4 of events.
seqno = event.getSeqno();
logger.info("Processing event: seqno=" + seqno);
if (seqno % 4 > 0)
{
logger.info("Dropping event: seqno=" + seqno);
return null;
}
else
{
return event;
}
}
Add the filter as follows to the master pipeline of the master
static-svc.properties.
...
replicator.stage.q-to-thl.filters=filter34
...
replicator.filter.filter34=com.continuent.tungsten.replicator.filter.JavaScriptF
ilter
replicator.filter.filter34.script=${replicator.home.dir}/samples/extensions/java
script/filter34.js
3. Restart the replicator and issue a few of 'trepctl heartbeat' commands on
the master.
4. Use thl list to list events and find one of the filtered events. Here's an
example:
thl list
...
SEQ# = 25 / FRAG# = 0 (last frag)
- TIME = 2012-11-03 10:41:45.919
- EPOCH# = 0
- EVENTID = mysql-bin.000159:0000000000006555;0
- SOURCEID = logos1
- METADATA = []
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSFilteredEvent
## Filtered events ##
From Seqno# 25 / Fragno# 0
To Seqno# 27 / Fragno# 0
5. Try to find any seqno other than the first in the filtered event, e.g.:
thl list -seqno 26
6. Now take the slave offline. Use SQL to alter the trep_commit_seqno table on
the slave to point to a seqno inside the filtered event, e.g.:
mysql> update trep_commit_seqno set seqno=26;
7. Bring the slave back online.
What is the expected output?
a.) In the case of the thl list in step 6, you would expect to find the seqno.
b.) In the case of the thl list, you would likewise expect to restart properly
even if the transaction is slightly off.
What do you see instead?
a.) The thl utility cannot find the filtered event, e.g.:
thl list -seqno 26
...
2012-11-03 10:42:34,852 [ - main] ERROR thl.THLManagerCtrl Unable to find
sequence number: 26
b.) The replicator fails to go online. The status looks like the following:
trepctl status
...
pendingError : Read failed on transaction log: seqno=27 taskId=0
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: Log seek failure: expected seqno=27 found seqno=28
...
What is the possible cause?
Log seek logic does not work for filtered events. We do not return the
filtered event and instead may return the next full event.
What is the proposed solution?
Alter log seek logic to return the filtered event. Clients must be prepared
for the filtered event to contain their sequence number rather than have it at
either end.
Additional information
This problem seems obscure but it's not. If you need to reset
trep_commit_seqno in deployments with a lot of filtered events, you can run
into serious problems setting the restart point correctly. Also, under some
circumstances it appears slave replicators will not restart if the logs have
the right combination of filtered and empty events. This came up in yet
another deployment and appears to be due to a separate bug.
Use labels and text to provide additional information.
Original issue reported on code.google.com by robert.h...@continuent.com on 3 Nov 2012 at 6:01
Original issue reported on code.google.com by
robert.h...@continuent.com
on 3 Nov 2012 at 6:01