coherence-community / oracle-bedrock

Oracle Bedrock
Other
55 stars 31 forks source link

Default CoherenceCluster.Predicate for stability should take Cluster Topology into account #362

Closed brianoliver closed 7 years ago

brianoliver commented 8 years ago

When using Bedrock to establish and/or orchestrate a Coherence Cluster across a number of different machines (without specifying a custom stability predicate), the predicate to ensure the cluster is "safe" and "stable" will hang as it's design to work on a single machine.

eg: At best the predicate will expect "node safety" is reached (when there are multiple nodes) and at worse "endangered" when there's only a single node. However when there are multiple machines, where "machine safe" will be reached, the predicate will attempt to wait for "node safe", which will never happen.

Instead the predicate should decide based on the number of machines and/or java virtual machines, to choose the highest possible level of service safety.

1 node = endangered n nodes (on one machine) = node safe n nodes (on multiple machines) = machine safe

brianoliver commented 8 years ago

After testing of a variety of platforms, the Predicate operates correctly, including taking cluster deployment topology into account.

brianoliver commented 7 years ago

There's some additional subtle requirements on this which means it should be re-opened. For example, we should not assume all cluster members have the same services. A cluster may have one service defined in one member, and not in another. We should not assume that because there are multiple cluster members that the one service must exist on all cluster members and that it should be cluster "safe".

Instead we need to consider all of the "auto-start" services defined by all cluster members, determine how many of them are required and then individually determine their "safety" based on that number.