Open hickscorp opened 3 years ago
This seems odd to me. Could you post the error message you are getting? Static/trusted nodes are merely suggestions. They should not result in any errors if they are offline. My best guess is that parsing the enode IDs fail, which should be apparent from the error message.
@karalabe thanks a lot for your help.
I know for a fact that the parsing goes well. The message is:
Fatal: /root/geth.toml, line 2: (p2p.Config.StaticNodes) lookup node1-new on 127.0.0.11:53: no such host
After which geth exits.
EDIT: Just FYI I have found a workaround - sleeping for 5 seconds before booting geth. This way, it gives 5s for all the containers to spin up. Geth crashes a few times, the containers are restarted, and when a happy coincidence of all the nodes being up happens, things connect.
But I really think that the "trying to connect to the nodes specified in the TOML file" should not be a Fatal
, but a Warning
and let Geth boot as usual and retry later...
Bump - any chance that connecting to peers specified in StaticNodes
/ TrustedNodes
could gracefully be a warning and retried rather than a boot error?
Bump?
Anyone please? Is it that I didn't phrase the issue correctly, or that there isn't any interest in addressing it?
any update on this issue? im really interesting to use domain than ip its more flexible when the ip can't be acces
Background
We are running a set of sealers and nodes in a docker swarm. Upon restarting the cluster, discovery between nodes doesn't work - so we have to add each node and sealer to each other node and sealer.
Initially (and when we were running Geth 1.8) we had an extra container, in charge of connecting to the HTTP RPC, querying each node, and registering it with all others using their HTTP RPC as well. The reason for this is that docker swarm doesn't guarantee that an IP address will be the same for the same container - so our extra container had a script to "scrape" IP addresses and building
enode://
URLS to distribute across the nodes.Problem Statement
More recently, we updated to Geth 1.10 and were pleasantly surprised when we discovered that
enode://
specifications can now accept DNS names (so we wouldn't need this scraper container). We tried and having anenode://...@dns_name@nodiscover=1
works great. So we decided that instead of having a "scraping service" in an extra container, we would instead connect our cluster usingstatic-nodes.json
(which didn't work, apparently deprecated) and then using ageth.toml
file. At this point, our TOML file was pretty much looking like this:Unfortunately, this attempt failed short because Geth refuses to boot if any of the
StaticNodes
/TrustedNodes
is unreachable - so there's a bit of a catch-22 situation here when restarting the whole cluster.Suggestion 1
It would be quite nice to have these
StaticNodes
and / orTrustedNodes
act as a warning rather than aFATAL
error - this way the node would boot, fail to contact the nodes, and retry later on.Current State of Things
We didn't stop just here. We made a script such as this one:
We were hopeful that there would somehow be an option to tell
Geth
to run this script. We discovered that we cannot really do that, unless we usegeth console
orgeth attach
- which doesn't work in our case since we still would need to run this manually after the cluster has started.Suggestion 2
Maybe allow for a script to be executed when
geth
started withoutconsole
/attach
, so thatadmin.addPeer
/admin.addTrustedPeer
could be used.