Closed glassfishrobot closed 14 years ago
@glassfishrobot Commented Reported by @jfialli
@glassfishrobot Commented @jfialli said: Fix checked into trunk and transport branch. Fine log message in ClusterViewManager.add(addToView) verified fix was working.
@glassfishrobot Commented This issue was imported from java.net JIRA SHOAL-95
@glassfishrobot Commented Marked as fixed on Wednesday, June 23rd 2010, 4:11:06 am
Issue was initially reported to shoal.dev.java.net by CC'ed email address. Related email: https://shoal.dev.java.net/servlets/ReadMsg?list=dev&msgNo=233 The issue is an edge case occurring during cluster startup.
Problem was reported that when starting up 20 instances for a cluster, sometimes it was observed that an instance was in view of Master but the instance did not exist and the instance was not in HealthMonitor so Master never attempted to check if instance existed. Master continued to propogate existing of this non-existence member and never performed heartbeat failure detection to verify the existence or non-existence of instance. This is the bug that will be addressed when this issue is resolved. The Master will always perform Heartbeat Failure detection for all instances in its clusterview. This will be addressed by synchronizing HealthMonitor knowledge of each instance in clusterview.
Hypothesis is that an instance failed during startup between sending out its MasterNodeQuery but before sending its first HealthMessage of STARTING.
Here is an observed WARNING message in server log when this occurs. This error message is consistent to what one would see when sending to an instance that just failed. Since Shoal heartbeat failure detection takes some time to detect a FAILURE (about 7-8 seconds with default values), this is not something that should be a concern. However, if this message is seen beyond 8 second window it should take to detect failure, then it is a concern.
{270}
failed. Unable to create an
Note: this issue does not impact Shoal/GMS in Glassfish/Sailfin Application Server.If App Server fails during startup, the app always catches exception and sends a planned shutdown notification to cluster.
Environment
Operating System: All Platform: All
Affected Versions
[current]