IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.
https://cardano.org
Apache License 2.0
3.06k stars 721 forks source link

[FR] - Live topology updates without node restart #3038

Closed andrejpodzimek closed 2 years ago

andrejpodzimek commented 3 years ago

External

Area Other Any other topic (Delegation, Ranking, ...).

Describe the feature you'd like While running topologyUpdater.sh hourly, it is recommended to restart the relay node each time to pick possible topology changes. This is a big problem, because the node startup takes more than 10 minutes (from systemctl restart to the point when the socket appears and communicates), despite the fact that my SSDs can serve around 5 GB/s. (The CPU is a bottleneck during initialization.)

This would lead to a downtime 1/6 of the time. In a stake pool configuration this is bad, because one could miss block minting or the like. It should be safe to just run topologyUpdater.sh hourly (as recommended) and tell cardano-node to reload the topology each time, ideally without downtime (as long as the topology doesn’t change dramatically).

Describe alternatives you've considered I considered two relay nodes on the same machine and on two different ports for redundancy, so that one is always up, even when one is being reloaded. Unfortunately, this would come with storage and computational overhead and there doesn’t seem to be a guarantee that block minting wouldn’t be missed with one of the relays down. (If there is such a guarantee, then this needs to be documented, i.e., how exactly having multiple hosts and ports could help.)

rdlrt commented 3 years ago
  1. topologyUpdater.sh is not part of cardano-node repo, but guild-operators repo. The recommendation is to run push (non-interruptive) hourly to report your node being live to the service and available to other peers to fetch, while you'd want to pull and refresh topology daily (only as a recommendation as a good practice to help each other out until P2P is available, not necessity).
  2. The restart should not take 10 minutes, if you're using SIGINT to kill processes.
  3. As part of P2P updates, topology refresh will already be available as part of signal to node
regel commented 3 years ago

Simple questions:

regel commented 3 years ago

P2P related information found on https://roadmap.cardano.org/.

Timeline:

This week, the team completed the second milestone of the P2P deployment, delivering an engineering testnet, which allows for automatic peer selection in the network. During this stage, the team tested and implemented different user configurations, established interoperability between legacy and P2P nodes, and produced a video that reflects automated peer selection.

The team had a call with SPOs, where they explained P2P project goals, P2P system design, and the concept of hot, cold, and warm peers. They also introduced the goals of the third milestone in P2P deployment (semi-public testnet), explaining that there will be a switch in the node to enable selection of either the new P2P mode or the existing (non-P2P) one. During the semi-public testnet delivery, the team will be inviting a small group of SPOs to help test system functionality.

The team fixed the stateTVar signature, worked on simultaneous TCP connections opened by the handshake protocol, and rebased p2p-master branches.

This is the last update I could find. I still dont know after fetching this information if the feature is included or not in 1.28 or 1.29, testnet, or mainnet, and what the name of the P2P config switch will be, but this is progress. Let read the code in the p2p-master branch and find out.

Finally, there is also this issue discussed on the forum where the current topology update of api.clio.one and the issues it causes for Kubernetes

regel commented 3 years ago

As of today, the p2p-master branch does not seem to be merged in a stable cardano-node release yet:

This branch is 14 commits ahead, 257 commits behind master.

This README file contains information on the new topology file format we can expect, but I guess it may still change in the future if P2P testing is still ongoing. I did not find a reference to the EnableP2P config switch in the documentation. Missing in the docs?

regel commented 3 years ago

Hey, I just released Helm Charts to run cardano node containers 🐳 in Kubernetes.

It does solve the Peer to Peer topology update by repeating this process every 24 hours:

  1. reading the on-chain data once per day 🕐 to find registered nodes,
  2. then vetting (is it alive, is the metadata hash valid, and so on) 🆗these registered nodes,
  3. finally selecting random nodes in this "valid" set of nodes,
  4. Publishing the 'new' topology to an internal Redis pub-sub topic,
  5. the new 🆕 topology file is received by an internal Redis client, and written in the filesystem mount of the relay container where it triggers a restart of the pod 6 the pod is restarted automatically and uses the new topology 🍾

Everything in this implementation is fully autonomous 🚗, and fully decentralised since it runs locally in the cluster. The topology update process is fair and transparent. Topology is updated and discovered automatically from the blockchain data itself and nothing else!

Link to the code on Github

Link to the peer to peer extension: Github

coot commented 2 years ago

This is available in (not yet supported) p2p version, and it will not be backported to non-p2p nodes.