Open ebdavison opened 6 years ago
Hm, did you start two separate bedrock processes, one for each node? I only see one process in the log, and you've only given a single command line. Each node is a different process, started independently (typically on different servers, but commonly on the same server for testing).
Yes, these are on different machines with IP connectivity between them.
root@ubusdm16:~# grep bedrock /etc/hosts
10.211.1.7 bedrock2
10.211.1.19 bedrock1
Node 1:
14433 ? Ssl 0:00 /usr/sbin/bedrock -fork -nodeName bedrock1 -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:18888 -nodeHost 0.0.0.0:8889 -priority 201 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -peerList 10.211.1.7:8889
14443 pts/2 S+ 0:00 grep --color=auto bedrock
root@skwiidelette:~# ping -c 4 bedrock2
PING bedrock2 (10.211.1.7) 56(84) bytes of data.
64 bytes from bedrock2 (10.211.1.7): icmp_seq=1 ttl=64 time=0.546 ms
64 bytes from bedrock2 (10.211.1.7): icmp_seq=2 ttl=64 time=0.609 ms
64 bytes from bedrock2 (10.211.1.7): icmp_seq=3 ttl=64 time=0.901 ms
64 bytes from bedrock2 (10.211.1.7): icmp_seq=4 ttl=64 time=0.488 ms
--- bedrock2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.488/0.636/0.901/0.158 ms
Node 2:
550 ? Ssl 0:00 /usr/sbin/bedrock -fork -nodeName bedrock2 -db /var/lib/bedrock/bedrock.db -serverHost 0.0.0.0:18888 -nodeHost 0.0.0.0:8889 -priority 202 -pidfile /var/run/bedrock.pid -quorumCheckpoint 100 -readThreads 4 -plugins status,db,jobs,cache,mysql -peerList 10.211.1.19:8889
1111 pts/0 S+ 0:00 grep --color=auto bedrock
root@ubusdm16:~# ping -c 4 bedrock1
PING bedrock1 (10.211.1.19) 56(84) bytes of data.
64 bytes from bedrock1 (10.211.1.19): icmp_seq=1 ttl=64 time=0.407 ms
64 bytes from bedrock1 (10.211.1.19): icmp_seq=2 ttl=64 time=0.498 ms
64 bytes from bedrock1 (10.211.1.19): icmp_seq=3 ttl=64 time=0.559 ms
64 bytes from bedrock1 (10.211.1.19): icmp_seq=4 ttl=64 time=0.548 ms
--- bedrock1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.407/0.503/0.559/0.060 ms
Another log entry that seems relevant that I missed in the original posting:
Jan 3 09:11:35 ubusdm16 bedrock: xxxxx (STCPNode.cpp:118) postSelect [write0] [warn] {bedrock2} Incoming connection failed from '10.211.1.19:36990' (unauthenticated node), recv='PING#015#012Timestamp: 1514992299904022#015#012Content-Length: 0#015#012#015#012LOGIN#015#012CommitCount: 4#015#012Hash: 2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8#015#012Priority: 201#015#012State: SEARCHING#015#012Version: 43234709c216b19ae7ce7ea7a5219467bf50e773#015#012Content-Length: 0#015#012#015#012', send=''
Ah, can you try replacing the IP addresses in the -peerList
with the corresponding DNS names in /etc/hosts
, which correspond to the corresponding -nodeName
? Basically, -peerList
is formatted as a comma separated list of nodeName:port
tuples, with the nodeName
portion being both the name that the node advertises and a DNS name that resolves to that node's IP. You can see the code that does the "authentication" here.
I put "authentication" in air quotes because it's just verifying a self-declared name -- I actually thought that it also did IP verification, but apparently not. We're in the midst of a security review internally so I'll make a point to add at least IP verification here, though as the comment in the code suggests, ideally we'd use some kind of public key verification.
Well, that is not quite right. I actually used the IP address directly in the -peerList
as opposed to the nodename (resolving to DNS). I will change that and see if that fixes it for me and will report back.
On a side note, at least for me on Ubuntu 14.04, there is no config file in /etc
so I had to manually "hack" the /etc/init.d/bedrock
startup script to add nodes, change the port, etc. It would be better practice to have a config file that is read in from /etc/bedrock
so startup script hacking is no necessary.
Additionally, the startup script is looking for the variable OTHERNODES
which is not configured in the script anywhere, either with OTHERNODES=""
or with a commented out "default".
(I installed from the package, not source.)
Looks like changing the -peerList
from IP to DNS name fixed the issue:
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:2202) _onConnect [write0] [info] {bedrock2/SEARCHING} ->{bedrock1} Sending LOGIN
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (STCPNode.cpp:162) postSelect [write0] [info] {bedrock2} Received PING from peer 'bedrock1'. Sending PONG.
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1585) _onMESSAGE [write0] [info] {bedrock2/SEARCHING} ->{bedrock1} Peer logged in at 'SEARCHING', priority #201 commit #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8)
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 1 of 1 full peers (1 with permaslaves), timeout in 117154ms
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:775) update [write0] [info] {bedrock2/SEARCHING} SYNCHRONIZING from freshest peer 'bedrock1 (0ms latency)
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (STCPNode.cpp:171) postSelect [write0] [info] {bedrock2} Received PONG from peer 'bedrock1' (1ms latency)
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1810) _onMESSAGE [write0] [info] {bedrock2/SYNCHRONIZING} ->{bedrock1} Beginning synchronization
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1818) _onMESSAGE [write0] [info] {bedrock2/SYNCHRONIZING} Synchronization complete, at commitCount #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8), WAITING
Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1613) _onMESSAGE [write0] [info] {bedrock2/WAITING} ->{bedrock1} Peer switched from 'SEARCHING' to 'WAITING' commit #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8)
I suppose that means some work on the documention is in order to make it clear that the -peerList
MUST be a DNS name and not an IP address. (In my case, these are local test machines with no real DNS, so they are both added to the /etc/hosts
file on each side and that seems to work.)
Yeah, the documentation may need to be a little more clear. They don't necessarily need to be DNS names, the name in -peerList
just needs to match the value of -nodeName
.
For example I setup a cluster with one node with -nodeName 127.0.0.1
and then the other two nodes had a -peerList
like node1:8887,127.0.0.1:8886
if that makes sense.
I see; that makes sense. Docs could state this more clearly, for sure.
So, to summarize: the node that is trying to connect will announce itself as -nodename xxx
and if xxx
matches in -peerList
then we are good to go. Got it. In my case, each -nodename
was a name, not IP but if I had made each one be their IP rather than nodename, it would have worked to put the IP in -peerList
.
Sorry for the delay. @ebdavison, can you submit a PR to fix the docs to be more clear? Thanks!
Looks like changing the
-peerList
from IP to DNS name fixed the issue:Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:2202) _onConnect [write0] [info] {bedrock2/SEARCHING} ->{bedrock1} Sending LOGIN Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (STCPNode.cpp:162) postSelect [write0] [info] {bedrock2} Received PING from peer 'bedrock1'. Sending PONG. Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1585) _onMESSAGE [write0] [info] {bedrock2/SEARCHING} ->{bedrock1} Peer logged in at 'SEARCHING', priority #201 commit #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8) Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:724) update [write0] [info] {bedrock2/SEARCHING} Signed in to 1 of 1 full peers (1 with permaslaves), timeout in 117154ms Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:775) update [write0] [info] {bedrock2/SEARCHING} SYNCHRONIZING from freshest peer 'bedrock1 (0ms latency) Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (STCPNode.cpp:171) postSelect [write0] [info] {bedrock2} Received PONG from peer 'bedrock1' (1ms latency) Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1810) _onMESSAGE [write0] [info] {bedrock2/SYNCHRONIZING} ->{bedrock1} Beginning synchronization Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1818) _onMESSAGE [write0] [info] {bedrock2/SYNCHRONIZING} Synchronization complete, at commitCount #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8), WAITING Jan 3 14:43:57 ubusdm16 bedrock: xxxxx (SQLiteNode.cpp:1613) _onMESSAGE [write0] [info] {bedrock2/WAITING} ->{bedrock1} Peer switched from 'SEARCHING' to 'WAITING' commit #4 (2DFF381F8FD3C91B2495EF5DBC7E8FF62D02C1D8)
I suppose that means some work on the documention is in order to make it clear that the
-peerList
MUST be a DNS name and not an IP address. (In my case, these are local test machines with no real DNS, so they are both added to the/etc/hosts
file on each side and that seems to work.)
Three and a half years later, and this resolves the same issue for me, and nothing in the setup documentation mentioned this hostname requirement....
I have a 2-node cluster I am trying to get up and running and I am seeing errors in the logs. The nodes are definitely talking together but will not start. The logs refer to unauthenticated peers. I see nothing in the docs about this.
I added OTHERNODES line to the /etc/init.d/bedrock startup script to point to each other and altered the PRIORITY, changed the port to 18888 of the main process to avoid collissions. Otherwise, no other changes on Ubuntu 14.04 Server.
Here is an excerpt from the logs:
Here is the process execution options: