eclipse-ee4j / glassfish-shoal

Shoal
Other
5 stars 9 forks source link

Stale ClusterView in Master due to no HeatlhMessages #95

Closed glassfishrobot closed 14 years ago

glassfishrobot commented 14 years ago

Issue was initially reported to shoal.dev.java.net by CC'ed email address. Related email: https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=233 The issue is an edge case occurring during cluster startup.

Problem was reported that when starting up 20 instances for a cluster, sometimes it was observed that an instance was in view of Master but the instance did not exist and the instance was not in HealthMonitor so Master never attempted to check if instance existed. Master continued to propogate existing of this non-existence member and never performed heartbeat failure detection to verify the existence or non-existence of instance. This is the bug that will be addressed when this issue is resolved. The Master will always perform Heartbeat Failure detection for all instances in its clusterview. This will be addressed by synchronizing HealthMonitor knowledge of each instance in clusterview.

Hypothesis is that an instance failed during startup between sending out its MasterNodeQuery but before sending its first HealthMessage of STARTING.

Here is an observed WARNING message in server log when this occurs. This error message is consistent to what one would see when sending to an instance that just failed. Since Shoal heartbeat failure detection takes some time to detect a FAILURE (about 7-8 seconds with default values), this is not something that should be a concern. However, if this message is seen beyond 8 second window it should take to detect failure, then it is a concern.

WARNING: ClusterManager.send : sending of message net.jxta.endpoint.Message@11882231(2)

{270}

failed. Unable to create an

OutputPipe for urn:jxta:uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303 route = null java.io.IOException: Unable to create a messenger to

jxta://uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303/PipeService/urn:jxta:uuid-63B5938B46F147609C1C998286EA5F3B6E0638B5DF604AEEAC09A3FAE829FBE804

at

net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:238)

at

net.jxta.impl.pipe.BlockingWireOutputPipe.(BlockingWireOutputPipe.java:154)

at

net.jxta.impl.pipe.BlockingWireOutputPipe.(BlockingWireOutputPipe.java:135)

at net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:503)

at net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:435)

at

net.jxta.impl.pipe.PipeServiceInterface.createOutputPipe(PipeServiceInterface.java:170)

at com.sun.enterprise.jxtamgmt.ClusterManager.send(ClusterManager.java:505) at

com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:254)

at

com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.sendMessage(DistributedStateCacheImpl.java:500)

at

com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.addToRemoteCache(DistributedStateCacheImpl.java:234)

Note: this issue does not impact Shoal/GMS in Glassfish/Sailfin Application Server.If App Server fails during startup, the app always catches exception and sends a planned shutdown notification to cluster.

Environment

Operating System: All Platform: All

Affected Versions

[current]

glassfishrobot commented 6 years ago
glassfishrobot commented 14 years ago

@glassfishrobot Commented Reported by @jfialli

glassfishrobot commented 14 years ago

@glassfishrobot Commented @jfialli said: Fix checked into trunk and transport branch. Fine log message in ClusterViewManager.add(addToView) verified fix was working.

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA SHOAL-95

glassfishrobot commented 14 years ago

@glassfishrobot Commented Marked as fixed on Wednesday, June 23rd 2010, 4:11:06 am