Closed branimirangelov closed 1 month ago
Minor note: it needs to be tested on 3 dev clusters at least, since 2 nodes dont create a Quorum which makes Raft halt. From Hashipcorp:
Lastly, there is the issue of updating the peer set when new servers are joining or existing servers are leaving. As long as a quorum of nodes is available, this is not an issue as Raft provides mechanisms to dynamically update the peer set. If a quorum of nodes is unavailable, then this becomes a very challenging issue. For example, suppose there are only 2 peers, A and B. The quorum size is also 2, meaning both nodes must agree to commit a log entry. If either A or B fails, it is now impossible to reach quorum. This means the cluster is unable to add, or remove a node, or commit any additional log entries. This results in unavailability. At this point, manual intervention would be required to remove either A or B, and to restart the remaining node in bootstrap mode.
In the current PoC, Kubernetes clusters are using public IP-based ingress. Given this setup, the libp2p transport is less practical compared to a simpler HTTP (gRPC) transport. Because of other reasons (e.g. remote attestation on application layer), the Base Protocol should include a basic (rudimentary) naming system, which will allow HTTP-based transport to discover public IPs, as the full DNS discovery will pertain to the autoscaler itself. The use of libp2p will become beneficial in the future when we introduce Apocryph nodes (Kubernetes clusters) that do not have public IPs for ingress.
The basic naming system will rely on IPFS (underlying DHT) to provide some basic Apocryph Node discovery that will enable the autoscaler to deploy itself on various nodes and generate list of its instances. In case hardware provider (or other entity) decides to launch Autoscaler instance and willing to join it to the Autscaler cluster, this instance will have to know at least to know the name (IP) of at least one Autoscaler instance to initate the negotiation process for joining the cluster.
Within the current PoC, implement "autonomous alike" application based on go-libp2p-raft library that is maintaining a KV store. The scope is to deploy it as end-user app and test it accross two separate dev clusters.
Important constraints:
References:
Note: This spike is part of the Autoscaler autonomous application effort.