Open git2212 opened 6 months ago
In our case also we are seeing this issue. We are having one volume with 3-brick:3-replica setup with 3-nodes. in one node we have restarted all gluster services. And after that glusterd was continously crashing, stating 0-management: Initialization of volume 'management' failed, review your volfile again
.
Logs from GlusterD:
[2023-10-23 04:35:15.477485] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f50547c96db] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xfd) [0x55de29541b4d] -->/usr/sbin/glusterd(cleanup_and_exit+0x54) [0x55de29541994] ) 0-: received signum (15), shutting down
[2023-10-23 04:38:56.022520] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 6.10 (args: /usr/sbin/glusterd -N -p /var/run/glusterd.pid)
[2023-10-23 04:38:56.022658] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 88
[2023-10-23 04:38:56.071590] I [MSGID: 106478] [glusterd.c:1422:init] 0-management: Maximum allowed open file descriptors set to 65536
[2023-10-23 04:38:56.071663] I [MSGID: 106479] [glusterd.c:1478:init] 0-management: Using /var/lib/glusterd/ as working directory
[2023-10-23 04:38:56.071704] I [MSGID: 106479] [glusterd.c:1484:init] 0-management: Using /var/run/gluster as pid file working directory
[2023-10-23 04:38:56.073867] I [socket.c:1022:socket_server_bind] 0-socket.management: process started listening on port (24007)
[2023-10-23 04:38:56.097616] I [socket.c:965:socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 11
[2023-10-23 04:38:56.097971] I [MSGID: 106059] [glusterd.c:1860:init] 0-management: base-port override: 49152
[2023-10-23 04:38:56.097982] I [MSGID: 106059] [glusterd.c:1865:init] 0-management: max-port override: 49152
[2023-10-23 04:39:04.956934] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 60000
[2023-10-23 04:39:34.308779] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: ddc1d9f0-a6e5-4751-8e35-0f91d326fa41
[2023-10-23 04:39:34.912425] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-10-23 04:39:34.912618] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2023-10-23 04:39:34.912669] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-10-23 04:39:34.912720] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2023-10-23 04:39:34.916260] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 49152
9: option base-port 49152
10: option transport.address-family inet
11: option transport.socket.listen-port 24007
12: option event-threads 1
13: option ping-timeout 0
14: option transport.socket.read-fail-log off
15: option transport.socket.keepalive-interval 2
16: option transport.socket.keepalive-time 10
17: option glusterd-sockfile /etc/glusterd_socket/gluster.sock
18: option transport-type socket
19: option working-directory /var/lib/glusterd/
20: end-volume
21:
+------------------------------------------------------------------------------+
[2023-10-23 04:39:34.916252] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2023-10-23 04:39:35.015736] I [MSGID: 101190] [event-epoll.c:688:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2023-10-23 04:39:35.017093] I [MSGID: 106487] [glusterd-handler.c:1516:glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2023-10-23 04:39:35.017342] I [MSGID: 106487] [glusterd-handler.c:1516:glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2023-10-23 04:39:35.021484] I [MSGID: 106163] [glusterd-handshake.c:1389:glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 60000
[2023-10-23 04:39:35.022776] I [MSGID: 106493] [glusterd-rpc-ops.c:468:glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 0c333528-b176-4229-8df5-230844b7ee6f, host: <
FYI: We are running GlusterFS-6.10
Some more info: When we look at the config directly of the GlusterFS, we saw that some files are missing. /var/lib/glusterfs/vols/ndp_vol/ndp_vol.<<brick-hostnames*>>.mnt-bricks-ndp_brick.vol - This file went missing on the GlusterFS crashing node.
@git2212 - Just a suggestion, could you also check, if these config files are missing for you?
Description of problem: glusterd service fails to start after a normal reboot
The exact command to reproduce the issue: I tried to debug the start sequence using:
/usr/sbin/glusterd --debug
The full output of the command that failed:
[2023-11-20 17:14:21.135785] I [MSGID: 100030] [glusterfsd.c:2865:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 7.2 (args: /usr/sbin/glusterd --debug) [2023-11-20 17:14:21.135909] I [glusterfsd.c:2593:daemonize] 0-glusterfs: Pid of current running process is 2256 [2023-11-20 17:14:21.135931] D [logging.c:1717:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2023-11-20 17:14:21.136832] D [MSGID: 0] [glusterfsd.c:810:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol input in flex scanner failed
Expected results:
- The operating system / glusterfs version:
Linux pi4-m4 5.4.0-1097-raspi #109-Ubuntu SMP PREEMPT Wed Oct 11 16:15:36 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux glusterfs 7.2 Repository revision: git://git.gluster.org/glusterfs.git
Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration