Closed mazzy89 closed 1 week ago
The code https://github.com/gnolang/gno/blob/90aa89c28d3c8ad3b7d7b67d0256426bde6cfbc9/tm2/pkg/p2p/switch.go#L495 suggests that DNS lookup errors are actually ignored and skipped. However the final result is that
2024-06-18T09:29:44.285Z DEBUG Blockpool has no peers {"module": "blockchain"}
there are no peers added.
A workaround adopted by many upstream similar services which rely on bootstrap to discover other peers is to introduce publishNotReadyAddresses: true
. This solves the problem.
Reopening the issue. Seems that even introducing publishNotReadyAddresses: true
in the Service
does not help. Some nodes gets up properly, while some other fails. The overall bootstrap mechanism is not deterministic. I would wonder whether a retry in the DNS lookup would help.l
Gave it another try and seems that after few seconds that node retries to correct to the peers which at that point have DNS available and the lookup succedeed. We can close this.
Validators cannot discover P2P peers when running as
StatefulSet
in k8sDescription
In a multi-node scenario, when a validator is started having configured under
p2p.persistent_peers
andp2p.seeds
a list of nodes, the DNS lookup fails. This is an issue suffered by other similar products such as RabbitMQ which during the bootstrap phase, they try to reach other nodes/peers. See here https://github.com/kubernetes/kubernetes/issues/92559#issuecomment-1196410671Your environment
Steps to reproduce
p2p.persistent_peers
andp2p.seeds
inconfig.toml
theService
address of the validators.gnoland start
Expected behaviour
The DNS lookup should succeeded and the node should be connected to another peer.
Actual behaviour
The DNS lookup fails. It seems it tries for the second time but it fails after because the DNS record is not ready yet.
Logs
Proposed solution
The issue should be fixed retrying multiple times the DNS lookup of the P2P peers. In a k8s environment where there are moving parts, it is crucial to have retry and backoff to increase the chance of successful connection