Open rmadaka opened 6 years ago
@rmadaka Can you check GD2 logs once and paste it here ?
Sorry for late , old setup went to some bad state, reproduced above scenario again and pasted logs below Logs:
time="2018-11-02 12:40:56.356381" level=info msg="10.233.64.1 - - [02/Nov/2018:12:40:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=a84699cf-6b0a-4ef3-a841-9c818deeff3b time="2018-11-02 12:41:56.355033" level=info msg="10.233.64.1 - - [02/Nov/2018:12:41:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=13c2856a-a710-4263-b682-fe3d526eacc6 time="2018-11-02 12:42:56.354790" level=info msg="10.233.64.1 - - [02/Nov/2018:12:42:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=e0ff5df0-9576-442a-8dc2-2b9e6616b9f7 time="2018-11-02 12:43:56.356649" level=info msg="10.233.64.1 - - [02/Nov/2018:12:43:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=ddd5ac8e-184b-4275-9873-41ef94aebdde time="2018-11-02 12:44:56.354932" level=info msg="10.233.64.1 - - [02/Nov/2018:12:44:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=bbfa8508-44c9-43cc-b4e3-9a0ab04ffa52 time="2018-11-02 12:45:56.357132" level=info msg="10.233.64.1 - - [02/Nov/2018:12:45:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=120d547d-0a0d-487f-8302-c325afa1b2e9 time="2018-11-02 12:46:56.355583" level=info msg="10.233.64.1 - - [02/Nov/2018:12:46:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=496eee66-5fad-417f-ae4b-fe72a443a7d1 time="2018-11-02 12:47:56.354818" level=info msg="10.233.64.1 - - [02/Nov/2018:12:47:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=8b67b7cc-c962-41e9-9c23-5b85bef50915 time="2018-11-02 12:48:22.841664" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(livenessWatcher).Watch]" time="2018-11-02 12:48:53.233603" level=info msg="peer disconnected from store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:51:events.(livenessWatcher).Watch]" time="2018-11-02 12:48:56.356002" level=info msg="10.233.64.1 - - [02/Nov/2018:12:48:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=0ff96806-ffde-4fd9-a60e-bb679dca7244 time="2018-11-02 12:49:56.356170" level=info msg="10.233.64.1 - - [02/Nov/2018:12:49:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=8e9afeb4-4a86-435d-982b-bd1b1a4a7a72 time="2018-11-02 12:50:09.888664" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] \"GET /v1/volumes HTTP/1.1\" 200 1387" reqid=55c21afd-5db6-45cb-b0ae-eb28453dbc54 time="2018-11-02 12:50:09.908669" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.908882" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=67d79ffb-13f9-40c1-b55e-33f52a966bd3 time="2018-11-02 12:50:09.935845" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:09 +0000] \"GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1\" 200 2084" reqid=daca60a2-564c-4798-9fce-87dd157fc4f6 time="2018-11-02 12:50:23.146792" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] \"GET /v1/volumes HTTP/1.1\" 200 1387" reqid=35a3c2d3-d305-4787-9187-cc7906a4857e time="2018-11-02 12:50:23.167456" level=error msg="failed RPC call" error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" remotepeer="d640f3ab-cd90-4670-8d85-5871316475be(gluster-kube1-0)" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 rpc=TxnSvc.RunStep source="[rpc-client.go:72:transaction.runStepOn]" txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.167626" level=error msg="Step failed on node." error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.233.66.74:24008: getsockopt: connection refused\"" node=d640f3ab-cd90-4670-8d85-5871316475be reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 source="[step.go:119:transaction.runStepFuncOnNodes]" step=bricks-status.Check txnid=facdbb90-46cd-4949-b39c-a19f9eb453bc time="2018-11-02 12:50:23.195260" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:23 +0000] \"GET /v1/volumes/pvc-30cc1447de8611e8/bricks HTTP/1.1\" 200 2084" reqid=ab80f3ae-d773-4e2e-ab79-7572f20792b9 time="2018-11-02 12:50:56.355403" level=info msg="10.233.64.1 - - [02/Nov/2018:12:50:56 +0000] \"GET /ping HTTP/1.1\" 200 0" reqid=c49009f7-acdf-4478-b03e-7153d83aa578 time="2018-11-02 12:51:17.533562" level=info msg="peer connected to store" id=d640f3ab-cd90-4670-8d85-5871316475be source="[liveness.go:48:events.(livenessWatcher).Watch]" time="2018-11-02 12:51:17.789189" level=info msg="client connected" address="10.233.66.74:1023" server=sunrpc source="[server.go:155:sunrpc.(SunRPC).acceptLoop]" transport=tcp time="2018-11-02 12:51:17.793524" level=info msg="client disconnected" address="10.233.66.74:1023" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
Providing output one more time:
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
Error getting volume status
Failed to connect to glusterd. Please check if
- Glusterd is running(http://10.233.9.177:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log
[root@gluster-kube2-0 /]# vi /var/log/glusterd2/glusterd2.log
[root@gluster-kube2-0 /]#
[root@gluster-kube2-0 /]#
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
No volumes found
[root@gluster-kube2-0 /]# glustercli volume status --endpoints=http://10.233.9.177:24007
Volume : pvc-30cc1447de8611e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 7774a140-c1e3-4337-aca3-ab6af9d60304 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick3/brick | true | 40826 | 57 |
| b1f2feef-9ca0-4471-9362-d290fbcef778 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick1/brick | true | 43980 | 56 |
| 073b8cf1-6f0f-4636-816f-c18dc5a19b87 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-30cc1447de8611e8/subvol1/brick2/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
@vpandey-RH any update on this one?
Could this be related to #1054 ?
@vpandey-RH Have we made any progress on this issue?
@atinmu Not yet. I have not been able to give time to this issue. Will work on it.
@rmadaka is this still valid with latest master?
Observed behavior
After delete/reboot of any one gd2 pod, login to any other gd2 pod and check the volume status. volume status output is keep on changing.
Expected/desired behavior
Volume status output should be proper, It has to show the all volume status , all brick status should be appropriate.
Details on how to reproduce (minimal and precise)
1) Create PVC 2) Delete/reboot any one of the gd2 pod 3) Login to other GD2 pod which is not rebooted or deleted 4) Then check the volume status 5) Volume status output showing like, some times "no volumes found", some time " Error getting volume status etc..", some times output is proper.
Information about the environment:
Other useful information
uuid.toml
from all nodes (default /var/lib/glusterd2/uuid.toml)statedump
from any one of the nodeUseful commands