gluster / glusterd2

[DEPRECATED] Glusterd2 is the distributed management framework to be used for GlusterFS.
GNU General Public License v2.0
167 stars 82 forks source link

Peer add is failing with backend network #1331

Open Akarsha-rai opened 6 years ago

Akarsha-rai commented 6 years ago

Observed behavior

Peer add is failing with backend network

Expected/desired behavior

Peer add should be success

Details on how to reproduce (minimal and precise)

  1. Have 3 node set-up with external etcd. In one machine(say n1) as 2 nic/ip.
  2. Peer add n1 with one ip(i.e 10.70.35.80) from node n2. Peer add is failing.
    
    [root@dhcp35-122 ~]# glustercli peer add 10.70.35.80
    Peer add failed

Response headers: X-Request-Id: 0af89bd5-1d07-49f7-89dd-f86c652d956a X-Gluster-Cluster-Id: 10f3fb83-326a-4e2a-97f1-7c6a5c9537f6 X-Gluster-Peer-Id: 5934c470-a583-42e4-a285-58ca93db53d4

Response body: failed to send join cluster request

3. Now tried peer add n1 with another ip(i.e, 10.70.35.121) form node n2. Peer add was success.

[root@dhcp35-122 ~]# glustercli peer add 10.70.35.121 Peer add successful +--------------------------------------+-----------------------------------+--------------------+--------------------+ | ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | +--------------------------------------+-----------------------------------+--------------------+--------------------+ | 7446bf45-00ae-4407-a42f-230330d956ae | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 10.70.35.121:24008 | | | | 10.70.35.121:24007 | | | | | 10.70.35.80:24007 | | +--------------------------------------+-----------------------------------+--------------------+--------------------+


### Information about the environment:

[root@dhcp35-122 ~]# glusterd2 --version glusterd version: v6.0-dev.28.git1b19aeb git SHA: 1b19aeb go version: go1.9.4 go OS/arch: linux/amd64

[root@dhcp35-229 ~]# cat /etc/centos-release CentOS Linux release 7.5.1804 (Core)

Other useful information

[root@dhcp35-122 ~]# cat /etc/glusterd2/glusterd2.toml 
localstatedir = "/var/lib/glusterd2"
logdir = "/var/log/glusterd2"
logfile = "glusterd2.log"
loglevel = "INFO"
rundir = "/var/run/glusterd2"
defaultpeerport = "24008"
peeraddress = ":24008"
clientaddress = ":24007"
#restauth should be set to false to disable REST authentication in glusterd2
restauth = false
etcdendpoints = "http://10.70.35.10:2379"
noembed = true

Log:

time="2018-11-16 13:30:55.198246" level=info msg="peer disconnected from store" id=3ca95c6f-80fc-4964-832d-5439ee6765dd source="[liveness.go:51:events.(*livenessWatcher).Watch]"
time="2018-11-16 13:34:02.753821" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" remote="10.70.35.80:24008" rpc=PeerService.Join source="[peer-rpc-clnt.go:47:peers.(*peerSvcClnt).JoinCluster]"
time="2018-11-16 13:34:02.753962" level=error msg="sending Join request failed" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" peer="10.70.35.80:24008" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a source="[addpeer.go:82:peers.addPeerHandler]"
time="2018-11-16 13:34:02.754089" level=info msg="127.0.0.1 - - [16/Nov/2018:13:34:02 +0530] \"POST /v1/peers HTTP/1.1\" 500 72" reqid=1085af62-50f9-4ea8-afed-43ff1d6a570a
vpandey-RH commented 6 years ago

@atinmu Can you decide the priority for this issue ? Does this needs to be taken up now ?

Madhu-1 commented 6 years ago

Error while dialing dial tcp 10.70.35.80:24008: getsockopt: connection refused\"" peer="10.70.35.80:24008"

says that connection refused on 24008 port.

defaultpeerport = "24008" peeraddress = ":24008" clientaddress = ":24007"

and from the config, I see the glusterd2 is listening on all the interfaces

if you want to run glusterd2 on anyone for the nic you need to do changes in the configuration file

peeraddress = "<IP>:24008" clientaddress = "<IP>:24007" @Akarsha-rai can you paste the glustercli peer list output? and some more info on the scenerio you are trying out.

oshankkumar commented 6 years ago

@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file

rishubhjain commented 6 years ago

@Akarsha-rai If you want the grpc server to listen on all IP, you can set peeraddress = "0.0.0.0:24008" in config file

I think we should mention this in doc so that it does not confuse the user. @Akarsha-rai Can you verify this?

Akarsha-rai commented 5 years ago

I tried giving peer addresses = "0.0.0.0:24008" in config file and was able to peer add with backend network.

But I faced few issues:

  1. Suppose node n1 has 2 ip( a & b), when I add 'b' from node n2 peer add was successful. Later when I try to add 'a' , peer add will fail with error saying "peer is part of another cluster". Shouldn't it fail with error "Peer exists with given addresses"?

  2. If node n1 has 2 ip(a & b), when I tried add 'b' from node n1 peer add was successful.

    
    [root@dhcp35-121 ~]# glustercli peer add 10.70.35.80
    Peer add successful
    +--------------------------------------+-----------------------------------+--------------------+-------------------+
    |                  ID                  |               NAME                |  CLIENT ADDRESSES  |  PEER ADDRESSES   |
    +--------------------------------------+-----------------------------------+--------------------+-------------------+
    | 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007    | 10.70.35.80:24008 |
    |                                      |                                   | 10.70.35.121:24007 | 0.0.0.0:24008     |
    |                                      |                                   | 10.70.35.80:24007  |                   |
    +--------------------------------------+-----------------------------------+--------------------+-------------------+

[root@dhcp35-121 ~]# glustercli peer status +--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+ | ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | ONLINE | PID | +--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+ | 327548aa-db90-485e-9439-d9ff117609c1 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 10.70.35.80:24008 | yes | 10274 | | | | 10.70.35.121:24007 | 0.0.0.0:24008 | | | | | | 10.70.35.80:24007 | | | | | f0eb23bb-5447-48da-bfd8-0b255ecf6f84 | dhcp35-121.lab.eng.blr.redhat.com | 127.0.0.1:24007 | 0.0.0.0:24008 | yes | 10274 | | | | 10.70.35.121:24007 | | | | | | | 10.70.35.80:24007 | | | | +--------------------------------------+-----------------------------------+--------------------+-------------------+--------+-------+

rishubhjain commented 5 years ago

I think checking client addresses as well as peer addresses before adding the peer should solve this problem. @aravindavk any suggestions?

atinmu commented 5 years ago

@aravindavk Is this a valid scenario in a opinionated GCS cluster?

aravindavk commented 5 years ago

Not applicable in GCS setup, both client and peer addresses are same in gcs setup