apple / swift-cluster-membership

Distributed Membership Protocol implementations in Swift
https://apple.github.io/swift-cluster-membership/
Apache License 2.0
201 stars 20 forks source link

Implement the `useUnreachableState` flag #5

Closed ktoso closed 4 years ago

ktoso commented 4 years ago

We can operate in two modes, one with unreachability and one with the classic failure detection means .dead mode.

The unreachable state pattern is not useful for most system and is to be disabled by default.

The implementation today uses the unreachable state, emits events about unreachable and awaits that someone calls confirm dead. We should only do this if useUnreachableState is true.

        /// Optional SWIM Protocol Extension: `SWIM.MemberStatus.unreachable`
        ///
        /// This is a custom extension to the standard SWIM statuses which first moves a member into unreachable state,
        /// while still trying to ping it, while awaiting for a final "mark it `.dead` now" from an external system.
        ///
        /// This allows for collaboration between external and internal monitoring systems before committing a node as `.dead`.
        /// The `.unreachable` state IS gossiped throughout the cluster same as alive/suspect are, while a `.dead` member is not gossiped anymore,
        /// as it is effectively removed from the membership. This allows for additional spreading of the unreachable observation throughout
        /// the cluster, as an observation, but not as an action (of removing given member).
        ///
        /// The `.unreachable` state therefore from a protocol perspective, is equivalent to a `.suspect` member status.
        ///
        /// Unless you _know_ you need unreachability, do not enable this mode, as it requires additional actions to be taken,
        /// to confirm a node as dead, complicating the failure detection and node pruning.
        ///
        /// By default this option is disabled, and the SWIM implementation behaves same as documented in the papers,
        /// meaning that when a node remains unresponsive for an exceeded amount of time it is marked as `.dead` immediately.
        public var useUnreachableState: Bool = false