danielcheng007 / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 0 forks source link

replicator with heap size less than 2048 fails to replicate a 64MB record #391

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. install the replicator with the default hesp size of 1024
2. run the same test used for Issue#390
3. wait for the slaves to apply the record

What is the expected output?

The record is applied.

What do you see instead?

The replicator fails because of insufficient memory

Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
clusterName            : default
currentEventId         : NONE
currentTimeMillis      : 1350698699185
dataServerHost         : qa.r4.continuent.com
extensions             : 
latestEpochNumber      : -1
masterConnectUri       : thl://qa.r1.continuent.com:2112/
masterListenUri        : thl://qa.r4.continuent.com:2112/
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Stage task failed: q-to-dbms
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000002:0000000067109955;0
pendingErrorSeqno      : 6
pendingExceptionMessage: Java heap space
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : dragon
serviceType            : unknown
simpleServiceName      : dragon
siteName               : default
sourceId               : qa.r4.continuent.com
state                  : OFFLINE:ERROR
timeInStateSeconds     : 21.995
uptimeSeconds          : 95.129
Finished status command...

What is the possible cause?

N/A

What is the proposed solution?

N/A

Additional information

Use this script to reproduce the issue:

TUNGSTEN_BASE=$HOME/installs/master_slave
TREPCTL=$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl
MASTER=example1.continuent.com
SLAVES=(example22.continuent.com )

USERNAME=tungsten
PASSWORD=mypass

MYSQL="mysql -u $USERNAME -p$PASSWORD"

$MYSQL -h $MASTER -e "set global binlog_format=row"
for SERVER in ${SLAVES[*]} $MASTER
do
    $MYSQL -h $SERVER -e "set global max_allowed_packet=200*1024*1024"
    echo "# $SERVER"
    $MYSQL -h $SERVER -e "select format(@@max_allowed_packet,0) as max_allowed_packet"
done

$MYSQL -h $MASTER -e "create schema if not exists test"
$MYSQL -h $MASTER -e "drop table if exists test.test4g"
$MYSQL -h $MASTER -e "drop table if exists test.innodb_lock_monitor"
$MYSQL -h $MASTER -e "CREATE TABLE test.innodb_lock_monitor(a int) 
ENGINE=INNODB;"
$MYSQL -h $MASTER -e "create table test.test4g(id int not null auto_increment 
primary key, t longtext, TS timestamp)"
$MYSQL -h $MASTER -e "insert into test.test4g(t) values (repeat( 'a', 
64*1024*1024) )"

Original issue reported on code.google.com by g.maxia on 20 Oct 2012 at 2:10

GoogleCodeExporter commented 9 years ago

Original comment by robert.h...@continuent.com on 5 Nov 2012 at 1:42

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 15 Jan 2013 at 4:41

GoogleCodeExporter commented 9 years ago

Original comment by jeff.m...@continuent.com on 21 Feb 2013 at 7:54

GoogleCodeExporter commented 9 years ago

Original comment by jeff.m...@continuent.com on 21 Feb 2013 at 7:56

GoogleCodeExporter commented 9 years ago

Original comment by jeff.m...@continuent.com on 21 Feb 2013 at 7:56

GoogleCodeExporter commented 9 years ago

Original comment by robert.h...@continuent.com on 18 Mar 2013 at 6:21

GoogleCodeExporter commented 9 years ago
We'll use 2.1.0 instead of 2.0.8, hence moving the issues.

Original comment by linas.vi...@continuent.com on 27 Mar 2013 at 3:13

GoogleCodeExporter commented 9 years ago
This is not an easy fix. We can't fragment *individual* statement or row 
update. We might need an extension of the record format: if object is bigger 
than X, then we'd store it externally. It would work similar to LOAD DATA 
INFILE.

Original comment by linas.vi...@continuent.com on 4 Jul 2013 at 2:38

GoogleCodeExporter commented 9 years ago
problem still persists with build 2.1.1-56.
The slave replicator dies while trying to apply the record.
Partial workaround: if we double the replicator heap memory

Original comment by g.maxia on 4 Jul 2013 at 4:25

GoogleCodeExporter commented 9 years ago
Do you feel an estimate how much more heap does it need for X MB record? Is it 
linear?

Original comment by linas.vi...@continuent.com on 4 Jul 2013 at 7:15

GoogleCodeExporter commented 9 years ago
If I use a bigger record, the replicator fails in different ways. We need to 
nail down this one (meaning we understand why it is failing) before we make an 
attempt at scaling the issue to bigger values

Original comment by g.maxia on 4 Jul 2013 at 9:12

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 26 Aug 2013 at 1:54

GoogleCodeExporter commented 9 years ago
There won't be a 2.1.3.

Original comment by linas.vi...@continuent.com on 17 Sep 2013 at 10:13

GoogleCodeExporter commented 9 years ago
This is a big change and need to be scheduled for a release when we can really 
fix it.  One fix is to change the replicator log format to segment large 
transactions into separate messages similar to the way that LOAD DATA 
statements are handled when pulling data from MySQL.  

Original comment by robert.h...@continuent.com on 11 Dec 2013 at 4:35

GoogleCodeExporter commented 9 years ago
How to reproduce using a Tungsten-Sandbox:

MASTER=~/tsb3/db_n1

$MASTER -e "set global binlog_format=row"
~/tsb3/db_use_all "set global max_allowed_packet=200*1024*1024"
~/tsb3/db_use_all "select format(@@max_allowed_packet,0) as max_allowed_packet"

$MASTER -e "create schema if not exists test"
$MASTER -e "drop table if exists test.test4g"
$MASTER -e "drop table if exists test.innodb_lock_monitor"
$MASTER -e "CREATE TABLE test.innodb_lock_monitor(a int) ENGINE=INNODB;"
$MASTER -e "create table test.test4g(id int not null auto_increment primary 
key, t longtext, TS timestamp)"
$MASTER -e "insert into test.test4g(t) values (repeat( 'a', 64*1024*1024) )"

resulting status:

NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : -1
clusterName            : tsandbox
currentEventId         : NONE
currentTimeMillis      : 1407250499626
dataServerHost         : gmini
extensions             :
host                   : gmini
latestEpochNumber      : -1
masterConnectUri       : thl://gmini:12110/
masterListenUri        : thl://gmini:12120/
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Event extraction failed
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: Connector handler terminated by THL exception: Unable 
to deserialize event
pipelineSource         : UNKNOWN
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10120
role                   : slave
seqnoType              : java.lang.Long
serviceName            : tsandbox
serviceType            : unknown
simpleServiceName      : tsandbox
siteName               : default
sourceId               : gmini
state                  : OFFLINE:ERROR
timeInStateSeconds     : 189.316
transitioningTo        :
uptimeSeconds          : 861.175
useSSLConnection       : false
version                : Tungsten Replicator 3.0.0 build 215
Finished status command...

Original comment by g.maxia on 5 Aug 2014 at 2:57

GoogleCodeExporter commented 9 years ago
Stack trace of the previous example:

INFO   | jvm 1    | 2014/08/05 21:51:50 | 2014-08-05 21:51:50,278 [tsandbox - 
remote-to-thl-0] INFO  pipeline.SingleThreadStageTask Last successfully 
processed event prior to termination: seqno=5 
eventid=mysql-bin.000002:0000000000001047;53
INFO   | jvm 1    | 2014/08/05 21:51:50 | 2014-08-05 21:51:50,278 [tsandbox - 
remote-to-thl-0] INFO  pipeline.SingleThreadStageTask Task event count: 6
INFO   | jvm 1    | 2014/08/05 21:51:50 | 2014-08-05 21:51:50,278 [tsandbox - 
pool-2-thread-1] ERROR management.OpenReplicatorManager Received error 
notification, shutting down services :
INFO   | jvm 1    | 2014/08/05 21:51:50 | Event extraction failed
INFO   | jvm 1    | 2014/08/05 21:51:50 | 
com.continuent.tungsten.replicator.extractor.ExtractorException: Connector 
handler terminated by THL exception: Unable to deserialize event
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.thl.RemoteTHLExtractor.extract(RemoteTHLExtra
ctor.java:304)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.thl.RemoteTHLExtractor.extract(RemoteTHLExtra
ctor.java:60)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:252)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:179)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
java.lang.Thread.run(Thread.java:695)
INFO   | jvm 1    | 2014/08/05 21:51:50 | Caused by: 
com.continuent.tungsten.replicator.thl.THLException: Connector handler 
terminated by THL exception: Unable to deserialize event
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.thl.Protocol.requestReplEvent(Protocol.java:3
75)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.thl.Connector.requestEvent(Connector.java:175
)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   at 
com.continuent.tungsten.replicator.thl.RemoteTHLExtractor.extract(RemoteTHLExtra
ctor.java:237)
INFO   | jvm 1    | 2014/08/05 21:51:50 |   ... 4 more
INFO   | jvm 1    | 2014/08/05 21:51:50 | 2014-08-05 21:51:50,281 [tsandbox - 
pool-2-thread-1] WARN  management.OpenReplicatorManager Performing emergency 
service shutdown

Original comment by g.maxia on 5 Aug 2014 at 3:00