dsukhoroslov / bagri

XML/Document DB on top of distributed cache
Apache License 2.0
41 stars 5 forks source link

There is no chance to share docs between node 0 and node 1 on second profile #12

Closed smelikh closed 8 years ago

smelikh commented 8 years ago

Preconditions:

  1. Bagri snapshot has been installed.

Steps:

  1. Run: «bgcache.cmd second 0» from appropriate directory. → First server node is running. logs are w/o errors.
  2. Check connection to admin cluster (port 3331) and schema cluster (port 10000) → Should be.
  3. Run second profile on node 1: «bgcache.cmd second 1» from appropriate Directory. → Second server node with second profile is running, logs w/o errors.
  4. Check download schema docs and their indexation.

Expected Result: Sum should be equal to 7 docs and 7 index but they will be distributed between nodes.

Actual Result: There is no sharing between two nodes.

dsukhoroslov commented 8 years ago

the issue is caused by some bug in hazelcast. Data migration to the second node failed with message:

2016-04-19 19:09:39.230 [hz.TPoX.generic-operation.thread-10] WARN com.hazelcast.cluster.impl.ClusterJoinManager - [localhost]:10000 [TPoX] [3.6.1] While waiting finalize join calls... java.util.concurrent.TimeoutException: Call Invocation{serviceName='hz:core:clusterService', op=com.hazelcast.cluster.impl.operations.FinalizeJoinOperation{identityHash=918880958, serviceName='hz:core:clusterService', partitionId=-1, replicaIndex=0, callId=1291, invocationTime=1461082174202 (Tue Apr 19 19:09:34 MSK 2016), waitTimeout=-1, callTimeout=60000, members=MemberInfo{address=Address[localhost]:10000, liteMember=false} MemberInfo{address=Address[localhost]:10001, liteMember=false} , postJoinOp=com.hazelcast.cluster.impl.operations.PostJoinOperation{identityHash=578877427, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Thu Jan 01 02:59:59 MSK 1970), waitTimeout=-1, callTimeout=9223372036854775807, operations=[com.hazelcast.spi.impl.eventservice.impl.operations.PostJoinRegistrationOperation{identityHash=1572646343, serviceName='null', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Thu Jan 01 02:59:59 MSK 1970), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.spi.impl.proxyservice.impl.operations.PostJoinProxyOperation{identityHash=504057009, serviceName='hz:core:proxyService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Thu Jan 01 02:59:59 MSK 1970), waitTimeout=-1, callTimeout=9223372036854775807}, com.hazelcast.map.impl.operation.PostJoinMapOperation{identityHash=12938556, serviceName='hz:impl:mapService', partitionId=-1, replicaIndex=0, callId=0, invocationTime=-1 (Thu Jan 01 02:59:59 MSK 1970), waitTimeout=-1, callTimeout=9223372036854775807}]}}, partitionId=-1, replicaIndex=0, tryCount=100, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[localhost]:10001, backupsExpected=0, backupsCompleted=0, connection=Connection [/127.0.0.1:10000 -> /127.0.0.1:10003], endpoint=Address[localhost]:10001, alive=true, type=MEMBER} encountered a timeout at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponse(InvocationFuture.java:367) ~[hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveApplicationResponseOrThrowException(InvocationFuture.java:335) ~[hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.get(InvocationFuture.java:223) ~[hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.util.FutureUtil.executeWithDeadline(FutureUtil.java:294) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.util.FutureUtil.waitWithDeadline(FutureUtil.java:278) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.util.FutureUtil.waitWithDeadline(FutureUtil.java:252) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.cluster.impl.ClusterJoinManager.startJoin(ClusterJoinManager.java:514) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.cluster.impl.ClusterJoinManager.startJoinRequest(ClusterJoinManager.java:314) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.cluster.impl.ClusterJoinManager.executeJoinRequest(ClusterJoinManager.java:231) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.cluster.impl.ClusterJoinManager.handleJoinRequest(ClusterJoinManager.java:150) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.cluster.impl.operations.JoinRequestOperation.run(JoinRequestOperation.java:40) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:172) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124) [hazelcast-all-3.6.1.jar:3.6.1] at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99) [hazelcast-all-3.6.1.jar:3.6.1] 2016-04-19 19:09:39.465 [hz.TPoX.migration] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Re-partitioning cluster data... Migration queue size: 135 2016-04-19 19:09:39.498 [hz.TPoX.event-6] INFO com.bagri.xdm.cache.hazelcast.impl.PopulationManagementImpl - migrationStarted; event: MigrationEvent{partitionId=0, status=STARTED, oldOwner=Member [localhost]:10000 this, newOwner=Member [localhost]:10001}; docs size: {} 2016-04-19 19:09:52.854 [cached1] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:10:07.853 [cached1] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:10:22.852 [cached16] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:10:37.852 [cached25] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:10:52.851 [cached1] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:11:07.851 [cached12] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134 2016-04-19 19:11:22.850 [cached12] INFO com.hazelcast.partition.InternalPartitionService - [localhost]:10000 [TPoX] [3.6.1] Remaining migration tasks in queue => 134

dsukhoroslov commented 8 years ago

opened issue in HZ: https://github.com/hazelcast/hazelcast/issues/7987

smelikh commented 8 years ago

Fixed. Works as expected.