gluster / glusterd2

[DEPRECATED] Glusterd2 is the distributed management framework to be used for GlusterFS.
GNU General Public License v2.0
167 stars 82 forks source link

Bricks are not running and volume stop failed #1427

Open Shrivaibavi opened 5 years ago

Shrivaibavi commented 5 years ago

Observed behavior

[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| e42c6604-6cf7-4aac-bd5e-dfe0cd674d4f | abc  | Replicate             | Stopped | tcp       | 3      |
| dd797f61-0c20-4894-8447-9b734e21f63b | dif  | Replicate             | Stopped | tcp       | 3      |
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Started | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# clear

[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| e42c6604-6cf7-4aac-bd5e-dfe0cd674d4f | abc  | Replicate             | Stopped | tcp       | 3      |
| dd797f61-0c20-4894-8447-9b734e21f63b | dif  | Replicate             | Stopped | tcp       | 3      |
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Started | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume start abc
Volume abc started successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume start dif
Volume dif started successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop dif
Volume dif stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop abc
Volume abc stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume delete abc
Are you sure you want to delete volume abc [yes/no]? yes
Volume abc deleted successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume delete dif
Are you sure you want to delete volume dif [yes/no]? yes
Volume dif deleted successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop xy
Volume xy stopped successfully
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume stop xyz
Volume stop failed

Failed to connect to glusterd. Please check if
- Glusterd is running(http://127.0.0.1:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume list
+--------------------------------------+------+-----------------------+---------+-----------+--------+
|                  ID                  | NAME |         TYPE          |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
| 7c0b0f49-9c61-4641-b688-c0a478924ba9 | xy   | Distributed-Replicate | Stopped | tcp       | 6      |
| e4f4ba5b-89c9-4b53-a2d3-5e7e6935804a | xyz  | Distributed-Replicate | Started | tcp       | 6      |
+--------------------------------------+------+-----------------------+---------+-----------+--------+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# systemctl status glusterd2
● glusterd2.service - GlusterD2, the management service for GlusterFS (pre-release)
   Loaded: loaded (/usr/lib/systemd/system/glusterd2.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-12-19 16:50:26 IST; 16min ago
 Main PID: 4022 (glusterd2)
   CGroup: /system.slice/glusterd2.service
           ├─ 4022 /usr/sbin/glusterd2 --config=/etc/glusterd2/glusterd2.toml
           ├─15182 /usr/sbin/glusterfs -s localhost --volfile-server-port 24007 --volfile-id gluster/...
           └─20932 /usr/sbin/glusterfsd --volfile-server 127.0.0.1 --volfile-server-port 24007 --volf...

Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: dlfcn 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: libpthread 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: llistxattr 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: setfsid 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: spinlock 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: epoll.h 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: xattr.h 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: st_atim.tv_nsec 1
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: package-string: glusterfs...
Dec 19 16:50:32 dhcp35-30.lab.eng.blr.redhat.com bricks-brick0-abc1[26843]: ---------
Hint: Some lines were ellipsized, use -l to show in full.
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# glustercli volume status xyz
Volume : xyz
+--------------------------------------+--------------+---------------------+--------+------+-----+
|               BRICK ID               |     HOST     |        PATH         | ONLINE | PORT | PID |
+--------------------------------------+--------------+---------------------+--------+------+-----+
| cc435430-91cb-4967-9743-ee11a0c28597 | 10.70.35.240 | /bricks/brick1/xyz2 | false  |    0 |   0 |
| c588e555-8c45-4d91-937c-d468cfd84a94 | 10.70.35.30  | /bricks/brick1/xyz3 | false  |    0 |   0 |
| 7454a53e-2fad-48b4-9933-137ff0845df8 | 10.70.35.106 | /bricks/brick1/xyz4 | false  |    0 |   0 |
| d472b3ae-5ddb-49d3-b435-826415a54dd4 | 10.70.35.240 | /bricks/brick1/xyz5 | false  |    0 |   0 |
| 572dfadc-6f79-4460-b356-549677d035ca | 10.70.35.30  | /bricks/brick1/xyz0 | false  |    0 |   0 |
| c7012149-5658-4e97-aebe-1ecdd3b093e7 | 10.70.35.106 | /bricks/brick1/xyz1 | false  |    0 |   0 |
+--------------------------------------+--------------+---------------------+--------+------+-----+
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# 
[root@dhcp35-30 ~]# curl -i -XPOST http://localhost:24007/v1/volumes/xyz/stop
HTTP/1.1 500 Internal Server Error
Content-Type: application/json; charset=UTF-8
X-Gluster-Cluster-Id: 8aa6daa1-3d77-4df3-a938-115f5797fd2a
X-Gluster-Peer-Id: 2115479a-c493-4b44-9119-aa78b0dfcd5e
X-Request-Id: 17a34195-7c07-4f6d-bb71-52e699918a4a
Date: Wed, 19 Dec 2018 11:40:33 GMT
Content-Length: 686

{"errors":[{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/62d8873b75178964.socket: connect: connection refused","peer-id":"2115479a-c493-4b44-9119-aa78b0dfcd5e","step":"vol-stop.StopBricks"}},{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/2733bc25947ab39f.socket: connect: connection refused","peer-id":"3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4","step":"vol-stop.StopBricks"}},{"code":2,"message":"a txn step failed","fields":{"error":"dial unix /var/run/glusterd2/e51b66a88b708c4d.socket: connect: no such file or directory","peer-id":"2423abfc-db94-4974-96cc-3d3af3e36753","step":"vol-stop.StopBricks"}}]}

Expected/desired behavior

The volume stop should be successful

Details on how to reproduce (minimal and precise)

Information about the environment:

Other useful information

[root@dhcp35-30 ~]# cat /etc/glusterd2/glusterd2.toml

localstatedir = "/var/lib/glusterd2" logdir = "/var/log/glusterd2" logfile = "glusterd2.log" loglevel = "INFO" rundir = "/var/run/glusterd2" defaultpeerport = "24008" peeraddress = ":24008" clientaddress = ":24007" restauth = false etcdendpoints = "http://10.70.35.173:2379" noembed = true

sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.553819" level=info msg="client connected" address="10.70.35.106:799" server=sun
rpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.555652" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick2/dif error="SearchByBrickPath: port for brick /bricks/brick2/dif not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.556304" level=info msg="client disconnected" address="10.70.35.106:799" server=
sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.912431" level=info msg="client connected" address="10.70.35.30:996" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.915921" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick1/xy2 error="SearchByBrickPath: port for brick /bricks/brick1/xy2 not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.916735" level=info msg="client disconnected" address="10.70.35.30:996" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.930666" level=info msg="client connected" address="10.70.35.30:980" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.932044" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick1/xy5 error="SearchByBrickPath: port for brick /bricks/brick1/xy5 not found" source="[rpc_pro
g.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.932528" level=info msg="client disconnected" address="10.70.35.30:980" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.958360" level=info msg="client connected" address="10.70.35.30:942" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.960856" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick0/abc3 error="SearchByBrickPath: port for brick /bricks/brick0/abc3 not found" source="[rpc_p
rog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.961836" level=info msg="client disconnected" address="10.70.35.30:942" server=s
unrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.962808" level=info msg="client connected" address="10.70.35.30:941" server=sunr
pc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp
time="2018-12-19 12:41:58.963972" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/
bricks/brick0/abc3 error="SearchByBrickPath: port for brick /bricks/brick0/abc3 not found" source="[rpc_p
rog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-12-19 12:41:58.964371" level=info msg="client disconnected" address="10.70.35.30:941" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
time="2018-12-19 12:41:58.967270" level=info msg="client connected" address="10.70.35.30:940" server=sunrpc source="[server.go:148:sunrpc.(*SunRPC).acceptLoop]" transport=tcp

brick logs

[2018-12-19 11:20:37.730143] I [glusterfsd-mgmt.c:926:glusterfs_handle_attach] 0-glusterfs: got attach for /var/lib/glusterd2/volfiles/xy.3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4.bricks-brick1-xy5.vol
[2018-12-19 11:20:37.747910] I [socket.c:902:__socket_server_bind] 1-socket.xy-changelog: closing (AF_UNIX) reuse check socket 14
[2018-12-19 11:20:37.749565] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-12-19 11:20:38.436434] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.30"
[2018-12-19 11:20:38.436537] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:38.444116] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.30"
[2018-12-19 11:20:38.444200] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:20:39.643828] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.106"
[2018-12-19 11:20:39.643885] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:39.667300] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.106"
[2018-12-19 11:20:39.667343] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:20:43.427703] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy2: allowed = "*", received addr = "10.70.35.240"
[2018-12-19 11:20:43.427749] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy2
[2018-12-19 11:20:43.463992] I [addr.c:54:compare_addr_and_update] 0-/bricks/brick1/xy5: allowed = "*", received addr = "10.70.35.240"
[2018-12-19 11:20:43.464031] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-xy-server: accepted client from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-1-client-2-RECON_NO:-0 (version: 6dev) with subvol /bricks/brick1/xy5
[2018-12-19 11:36:31.979004] I [glusterfsd-mgmt.c:279:glusterfs_handle_terminate] 0-glusterfs: detaching not-only child /bricks/brick1/xy2
[2018-12-19 11:36:31.979293] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.30:959
[2018-12-19 11:36:31.979383] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(15)
[2018-12-19 11:36:31.979442] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.106:952
[2018-12-19 11:36:31.979494] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(17)
[2018-12-19 11:36:31.979511] I [server.c:1640:server_notify] 0-xy-server: disconnecting 10.70.35.240:955
[2018-12-19 11:36:31.979566] I [socket.c:811:__socket_shutdown] 0-tcp.xy-server: intentional socket shutdown(19)
[2018-12-19 11:36:31.979595] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.979929] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980088] I [MSGID: 115036] [server.c:494:server_rpc_notify] 0-xy-server: disconnecting connection from CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980613] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:83af0436-c22e-4ee9-b596-364a3583c7b9-GRAPH_ID:6-PID:18906-HOST:dhcp35-240.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980922] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:30bf137c-b074-483e-b287-d77d86765bab-GRAPH_ID:5-PID:2603-HOST:dhcp35-106.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.980924] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:31.980997] I [server.c:408:server_call_xlator_mem_cleanup] 0-xy-server: Create graph janitor thread for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.980613] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-xy-server: Shutting down connection CTX_ID:156bf9bf-bdab-4f36-8ed1-45e74d3b73fe-GRAPH_ID:6-PID:15182-HOST:dhcp35-30.lab.eng.blr.redhat.com-PC_NAME:xy-replicate-0-client-2-RECON_NO:-0
[2018-12-19 11:36:31.981155] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:31.981306] E [xlator.c:1432:glusterfs_delete_volfile_checksum] 0-xy-server: failed to get volfile checksum for volfile id xy.3aa60137-4a3b-4c88-8ff7-4b4bf5fa87d4.bricks-brick1-xy2.
[2018-12-19 11:36:31.981357] I [index.c:2604:notify] 0-xy-index: Notify GF_EVENT_PARENT_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.981471] I [io-threads.c:1312:notify] 0-xy-io-threads: Notify GF_EVENT_PARENT_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.981526] I [changelog.c:2022:notify] 0-xy-changelog: cleanup changelog rpc connection of brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983198] I [socket.c:811:__socket_shutdown] 0-socket.xy-changelog: intentional socket shutdown(12)
[2018-12-19 11:36:31.983257] I [posix-common.c:158:posix_notify] 0-xy-posix: Sending CHILD_DOWN for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983389] I [server.c:1586:server_notify] 0-xy-server: Getting CHILD_DOWN event for brick /bricks/brick1/xy2
[2018-12-19 11:36:31.983427] I [server.c:617:server_graph_janitor_threads] 0-xy-server: Start call fini for brick /bricks/brick1/xy2 stack
[2018-12-19 11:36:31.983504] E [rpcsvc.c:1825:rpcsvc_get_listener] 0-rpc-service: invalid port for listener socket.xy-changelog
[2018-12-19 11:36:31.987658] I [barrier.c:665:fini] 0-xy-barrier: Disabling barriering and dequeuing all the queued fops
[2018-12-19 11:36:31.988311] I [io-stats.c:4023:fini] 0-/bricks/brick1/xy2: io-stats translator unloaded
[2018-12-19 11:36:32.183876] I [glusterfsd-mgmt.c:260:glusterfs_handle_terminate] 0-glusterfs: terminating after loss of last child /bricks/brick1/xy5
[2018-12-19 11:36:31.981428] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2018-12-19 11:36:32.184281] W [glusterfsd.c:1543:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fe27358ddd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x56159e6ac105] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6d) [0x56159e6abf5d] ) 0-: received signum (15), shutting down

Useful commands

rishubhjain commented 5 years ago

Can you paste brick logs of the time when you started volume "xyz" ?

Shrivaibavi commented 5 years ago

@rishubhjain The logs are quite big. you can use my machine instead if you want to debug. 10.70.35.30

aravindavk commented 5 years ago

Two fix required.

1) glustercli - Blindly catching "Connection Refused" error and displaying as glusterd2 is probably down. https://github.com/gluster/glusterd2/blob/master/glustercli/cmd/common.go#L43 2) In all Brick ops, handle connection refused error and act accordingly. For example, while stopping the brick glusterd2 tries to connect and send brickop to stop, Ignore if connection refused error.

atinmu commented 5 years ago

@harigowtham What's the update on this?

harigowtham commented 5 years ago

Unable to reproduce the issue. will try it a few more times. There is one odd error message "no such file or directory" is the error message at one of the node. Need to see deeper as to why this one is different.

aravindavk commented 5 years ago

@harigowtham try to stop the volume immediately after glusterd2 restart(Before bricks connect back to glusterd2)