eclipse-ee4j / glassfish-shoal

Shoal
Other
5 stars 9 forks source link

more reliable failure notification #55

Open glassfishrobot opened 16 years ago

glassfishrobot commented 16 years ago

Instance A is either going down or under load. So Instance B starts to retry its connection to instance A. Before instance B can deem instance A as dead or alive, there needs to be an intermediate state called "in_retry_mode" that can help the GMS clients. For e.g. CLB can make use of this state to ping instance A again after a little while. In memory rep code can also make use of this intermediate state to determine that instance A is in "in_retry_mode" and then if the pipecloseevent has occurred, then a new pipe can be created if instance A is now alive.

Environment

Operating System: All Platform: OpenSolaris

Affected Versions

[current]

glassfishrobot commented 6 years ago
glassfishrobot commented 16 years ago

@glassfishrobot Commented Reported by sheetalv

glassfishrobot commented 16 years ago

@glassfishrobot Commented sheetalv said: NA for Sailfin 1.0

glassfishrobot commented 16 years ago

@glassfishrobot Commented @jfialli said: 2 cases to address:

1. false positives occurring when miss 3 heartbeats from an instance that is in middle of full GC. (full GC can take 12 to 15 seconds). Other instances in cluster receive incorrectly receive FAILURE_NOTIFICATION and instance is still running once full gc completes.

2. nodeagent detects a failed instance and restarts before shoal can detect the instance has failed and notify others in cluster. Happens on faster, newer machines.

glassfishrobot commented 16 years ago

@glassfishrobot Commented sheetalv said:

glassfishrobot commented 15 years ago

@glassfishrobot Commented sheetalv said: too big of an architecture change for Sailfin 1.5. NA for Sailfin 1.5.

glassfishrobot commented 15 years ago

@glassfishrobot Commented sheetalv said: WatchDog notification implementation has been added to Shoal. This takes care of case 2 (DAS restart) of what Joe has mentioned above.

glassfishrobot commented 16 years ago

@glassfishrobot Commented Issue-Links: blocks SHOAL-58

glassfishrobot commented 16 years ago

@glassfishrobot Commented Was assigned to sheetalv

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA SHOAL-55