Closed brianoliver closed 7 years ago
After testing of a variety of platforms, the Predicate operates correctly, including taking cluster deployment topology into account.
There's some additional subtle requirements on this which means it should be re-opened. For example, we should not assume all cluster members have the same services. A cluster may have one service defined in one member, and not in another. We should not assume that because there are multiple cluster members that the one service must exist on all cluster members and that it should be cluster "safe".
Instead we need to consider all of the "auto-start" services defined by all cluster members, determine how many of them are required and then individually determine their "safety" based on that number.
When using Bedrock to establish and/or orchestrate a Coherence Cluster across a number of different machines (without specifying a custom stability predicate), the predicate to ensure the cluster is "safe" and "stable" will hang as it's design to work on a single machine.
eg: At best the predicate will expect "node safety" is reached (when there are multiple nodes) and at worse "endangered" when there's only a single node. However when there are multiple machines, where "machine safe" will be reached, the predicate will attempt to wait for "node safe", which will never happen.
Instead the predicate should decide based on the number of machines and/or java virtual machines, to choose the highest possible level of service safety.
1 node = endangered n nodes (on one machine) = node safe n nodes (on multiple machines) = machine safe