Open GoogleCodeExporter opened 9 years ago
Position of the stores:
serialized : false
stopRequested : false
store.0 : THLParallelReadTask task_id=0 thread_name=store-thl-0
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=240 discarded=6785 events=0
store.1 : THLParallelReadTask task_id=1 thread_name=store-thl-1
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232 discarded=6793 events=0
store.10 : THLParallelReadTask task_id=10
thread_name=store-thl-10 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238
discarded=6787 events=0
store.11 : THLParallelReadTask task_id=11
thread_name=store-thl-11 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.12 : THLParallelReadTask task_id=12
thread_name=store-thl-12 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.13 : THLParallelReadTask task_id=13
thread_name=store-thl-13 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.14 : THLParallelReadTask task_id=14
thread_name=store-thl-14 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.15 : THLParallelReadTask task_id=15
thread_name=store-thl-15 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.16 : THLParallelReadTask task_id=16
thread_name=store-thl-16 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.17 : THLParallelReadTask task_id=17
thread_name=store-thl-17 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.18 : THLParallelReadTask task_id=18
thread_name=store-thl-18 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.19 : THLParallelReadTask task_id=19
thread_name=store-thl-19 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.2 : THLParallelReadTask task_id=2 thread_name=store-thl-2
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=239 discarded=6786 events=0
store.20 : THLParallelReadTask task_id=20
thread_name=store-thl-20 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.21 : THLParallelReadTask task_id=21
thread_name=store-thl-21 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.22 : THLParallelReadTask task_id=22
thread_name=store-thl-22 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.23 : THLParallelReadTask task_id=23
thread_name=store-thl-23 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.24 : THLParallelReadTask task_id=24
thread_name=store-thl-24 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.25 : THLParallelReadTask task_id=25
thread_name=store-thl-25 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.26 : THLParallelReadTask task_id=26
thread_name=store-thl-26 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.27 : THLParallelReadTask task_id=27
thread_name=store-thl-27 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.28 : THLParallelReadTask task_id=28
thread_name=store-thl-28 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.29 : THLParallelReadTask task_id=29
thread_name=store-thl-29 hi_seqno=288000 lo_seqno=280976 read=7026 accepted=232
discarded=6793 events=0
store.3 : THLParallelReadTask task_id=3 thread_name=store-thl-3
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=239 discarded=6786 events=0
store.4 : THLParallelReadTask task_id=4 thread_name=store-thl-4
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=239 discarded=6786 events=0
store.5 : THLParallelReadTask task_id=5 thread_name=store-thl-5
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238 discarded=6787 events=0
store.6 : THLParallelReadTask task_id=6 thread_name=store-thl-6
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238 discarded=6787 events=0
store.7 : THLParallelReadTask task_id=7 thread_name=store-thl-7
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238 discarded=6787 events=0
store.8 : THLParallelReadTask task_id=8 thread_name=store-thl-8
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238 discarded=6787 events=0
store.9 : THLParallelReadTask task_id=9 thread_name=store-thl-9
hi_seqno=288000 lo_seqno=280976 read=7026 accepted=238 discarded=6787 events=0
storeClass :
com.continuent.tungsten.replicator.thl.THLParallelQueue
syncInterval : 10
Original comment by linas.vi...@continuent.com
on 20 May 2013 at 1:55
It seems that the replicator reached the right seqno, but it does not display
it.
./cookbook/trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000002:0000000076111508;0
appliedLastSeqno : 299999
appliedLatency : 0.159
channels : 10
clusterName : default
currentEventId : NONE
currentTimeMillis : 1369059117597
dataServerHost : host4
extensions :
latestEpochNumber : 0
masterConnectUri : thl://host1:2112/
masterListenUri : thl://host4:2112/
maximumStoredSeqNo : 300000
minimumStoredSeqNo : 0
offlineRequests : Offline at sequence number: 300000
pendingError : NONE
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: NONE
pipelineSource : thl://host1:2112/
relativeLatency : 399.597
resourcePrecedence : 99
rmiPort : 10000
role : slave
seqnoType : java.lang.Long
serviceName : cookbook
serviceType : local
simpleServiceName : cookbook
siteName : default
sourceId : host4
state : ONLINE
timeInStateSeconds : 816.999
transitioningTo :
uptimeSeconds : 817.76
version : Tungsten Replicator 2.1.0 build 245
Finished status command...
# node: host4 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+-------
--------------+
| seqno | source_id | applied_latency | shard_id | update_timestamp |
extract_timestamp |
+--------+-----------+-----------------+----------+---------------------+-------
--------------+
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 300000 | host1 | 0 | load4 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
| 299999 | host1 | 0 | load7 | 2013-05-20 16:05:18 |
2013-05-20 16:05:18 |
+--------+-----------+-----------------+----------+---------------------+-------
--------------+
Original comment by g.maxia
on 20 May 2013 at 2:31
Original comment by g.maxia
on 19 Jun 2013 at 5:27
I have tested the issue again. What is going on is the following:
1) One of the threads reaches the wanted seqno.
2) the replicator stops replicating (and this is good)
3) all the other threads display a seqno that is one less than the one we
indicated for going offline (this is not good)
4) when we run trepctl status, it displays the minimum seqno, not the maximum
that was reached. (Issue#513)
5) Because of the previous item, 'trepctl status' reports that it will stop at
seqno X, but since it displays seqno X-1, it still shows the state as 'ONLINE'
(and this is really bad).
Summing up, the replicator has gone offline, but it does not know it.
So, the fix for this issue is to make sure that the replicator looks at the
maximum seqno currently in the channels, and use that one to determine the
status.
Original comment by g.maxia
on 20 Jun 2013 at 10:42
But is it safe to go offline when maximum seqno is reached? Won't we risk other
channels having not applied some of the previous seqnos?
Original comment by linas.vi...@continuent.com
on 20 Jun 2013 at 11:37
Something must be happening inside the replicator that makes it stop pulling
transactions from the THL.
If you look at "trepctl status -name shards", you will see that all channels
are in stage q-to-dbms.
But that happens when we notice that the replicator is not pulling data anymore.
If we run 'trepctl offline' on a busy server, the replicator consolidates all
channels to the latest applied seqno and goes offline. It seems that in this
case the replicator is doing the same work, but for some reason it does not
complete it.
Original comment by g.maxia
on 20 Jun 2013 at 12:12
Moving to 2.1.2
Original comment by linas.vi...@continuent.com
on 21 Aug 2013 at 12:59
Original comment by linas.vi...@continuent.com
on 26 Aug 2013 at 1:54
There won't be a 2.1.3.
Original comment by linas.vi...@continuent.com
on 17 Sep 2013 at 10:13
Original comment by linas.vi...@continuent.com
on 18 Dec 2013 at 1:38
Original comment by linas.vi...@continuent.com
on 2 May 2014 at 9:14
I'm unscheduling this issue, because it's not painful and we don't have an
urgent need. I'm assigning it to Robert, as parallel apply author too.
Original comment by linas.vi...@continuent.com
on 10 Sep 2014 at 1:00
Original issue reported on code.google.com by
linas.vi...@continuent.com
on 20 May 2013 at 1:51