gluster / glusterd2

[DEPRECATED] Glusterd2 is the distributed management framework to be used for GlusterFS.
GNU General Public License v2.0
167 stars 82 forks source link

Gluster volume status output not consistent on gd2 pods, after delete/reboot of gd2 pod on gcs setup #1309

Open rmadaka opened 6 years ago

rmadaka commented 6 years ago

Observed behavior

After delete/reboot of any one gd2 pod, login to any other gd2 pod and check the volume status. volume status output is keep on changing.

No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false  |     0 |   0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true   | 45063 |  57 |
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true   | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true   | 44607 |  65 |
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| dbe6f89e-d584-4afd-9da4-9e324384d548 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick3/brick | false  |     0 |   0 |
| d5c60208-b5db-4752-a01b-0f2d5922d478 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick1/brick | false  |     0 |   0 |
| 8c4b581b-46a1-4460-b43d-ba7181689d10 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-a6375e4adcd711e8/subvol1/brick2/brick | true   | 45063 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Volume : pvc-d3006e55dce511e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false  |     0 |   0 |
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true   | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true   | 44607 |  65 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
Volume : pvc-a6375e4adcd711e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.51.175:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube3-0 /]# glustercli volume status --endpoints=http://10.233.51.175:24007
No volumes found

Expected/desired behavior

Volume status output should be proper, It has to show the all volume status , all brick status should be appropriate.

Details on how to reproduce (minimal and precise)

1) Create PVC 2) Delete/reboot any one of the gd2 pod 3) Login to other GD2 pod which is not rebooted or deleted 4) Then check the volume status 5) Volume status output showing like, some times "no volumes found", some time " Error getting volume status etc..", some times output is proper.

Information about the environment:

Other useful information

Useful commands

vpandey-RH commented 6 years ago

@rmadaka Can you check GD2 logs once and paste it here ?

rmadaka commented 6 years ago

Sorry for late , old setup went to some bad state, reproduced above scenario again and pasted logs below Logs:

time="2018-11-02 12:40:56.356381" level=info msg="10.233.64.1 - - [02/Nov/2018:12:40:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=a84699cf-6b0a-4ef3-a841-9c818deeff3b time="2018-11-02 12:41:56.355033" level=info msg="10.233.64.1 - - [02/Nov/2018:12:41:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=13c2856a-a710-4263-b682-fe3d526eacc6 time="2018-11-02 12:42:56.354790" level=info msg="10.233.64.1 - - [02/Nov/2018:12:42:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=e0ff5df0-9576-442a-8dc2-2b9e6616b9f7 time="2018-11-02 12:43:56.356649" level=info msg="10.233.64.1 - - [02/Nov/2018:12:43:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=ddd5ac8e-184b-4275-9873-41ef94aebdde time="2018-11-02 12:44:56.354932" level=info msg="10.233.64.1 - - [02/Nov/2018:12:44:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=bbfa8508-44c9-43cc-b4e3-9a0ab04ffa52 time="2018-11-02 12:45:56.357132" level=info msg="10.233.64.1 - - [02/Nov/2018:12:45:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=120d547d-0a0d-487f-8302-c325afa1b2e9 time="2018-11-02 12:46:56.355583" level=info msg="10.233.64.1 - - [02/Nov/2018:12:46:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=496eee66-5fad-417f-ae4b-fe72a443a7d1 time="2018-11-02 12:47:56.354818" level=info msg="10.233.64.1 - - [02/Nov/2018:12:47:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=8b67b7cc-c962-41e9-9c23-5b85bef50915 time="2018-11-02 12:48:22.841664" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(livenessWatcher).Watch]" time="2018-11-02 12:48:53.233603" level=info msg="peer disconnected from store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:51:events.(livenessWatcher).Watch]" time="2018-11-02 12:48:56.356002" level=info msg="10.233.64.1 - - [02/Nov/2018:12:48:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=0ff96806-ffde-4fd9-a60e-bb679dca7244 time="2018-11-02 12:49:56.356170" level=info msg="10.233.64.1 - - [02/Nov/2018:12:49:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=8e9afeb4-4a86-435d-982b-bd1b1a4a7a72 time="2018-11-02 12:50:09.888664" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] \"GET /v1/volumes HTTP/1.1\" 200 1387" reqid=55c21afd-5db6-45cb-b0ae-eb28453dbc54 time="2018-11-02 12:50:09.908669" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.908882" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.935845" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] \"GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1\" 200 2084" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 time="2018-11-02 12:50:23.146792" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] \"GET /v1/volumes HTTP/1.1\" 200 1387" reqid=35a3c2d3-d305-4787-9187-cc7906a4857e time="2018-11-02 12:50:23.167456" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.167626" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.195260" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] \"GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1\" 200 2084" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 time="2018-11-02 12:50:56.355403" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=c49009f7-acdf-4478-b03e-7153d83aa578 time="2018-11-02 12:51:17.533562" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(livenessWatcher).Watch]" time="2018-11-02 12:51:17.789189" level=info msg="client connected" address="10.233.66.74:1023" server=sunrpc source="[server.go:155:sunrpc.(SunRPC).acceptLoop]" transport=tcp time="2018-11-02 12:51:17.793524" level=info msg="client disconnected" address="10.233.66.74:1023" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"

rmadaka commented 6 years ago

Providing output one more time:

[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
Error getting volume status

Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.9.177:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log 
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log 
[root@gluster-kube2-0 /]# 
[root@gluster-kube2-0 /]# 
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true   | 40826 |  57 |
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true   | 43980 |  56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false  |     0 |   0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Madhu-1 commented 6 years ago

@vpandey-RH any update on this one?

atinmu commented 6 years ago

Could this be related to #1054 ?

atinmu commented 6 years ago

@vpandey-RH Have we made any progress on this issue?

vpandey-RH commented 6 years ago

@atinmu Not yet. I have not been able to give time to this issue. Will work on it.

atinmu commented 5 years ago

@rmadaka is this still valid with latest master?