Closed idefxH closed 2 years ago
[2022-01-13 15:44:13.822036 +0000] T [MSGID: 0] [common-utils.c:504:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: localhost
[2022-01-13 15:44:13.826480 +0000] D [MSGID: 0] [common-utils.c:542:gf_resolve_ip6] 0-resolver: returning ip-::1 (port-24007) for hostname: localhost and port: 24007
When glusterfs is trying to resolve localhost, ip6 address is coming.
Could you edit /etc/hosts to remove the association to ::1 and try it? i.e.
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
Thank you very much, this makes so much sense!
I have removed all references to ipv6 addresses in my /etc/hosts on all nodes, and rebooted everything.
Apparently, it is still trying to resolve an ipv6 address. But now, it gets no answer. Here is the trace I got (once again, I did the test, then retrieve the command line, added the -L TRACE flag, ran it again, and give you the output):
[2022-01-14 09:15:50.174237 +0000] I [MSGID: 100030] [glusterfsd.c:2681:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=9.3}, {cmdlinestr=/usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6}]
[2022-01-14 09:15:50.174508 +0000] T [MSGID: 0] [glusterfsd.c:557:create_fuse_mount] 0-glusterfsd: mount point not found, not a client process
[2022-01-14 09:15:50.175426 +0000] D [MSGID: 0] [glusterfsd.c:2219:glusterfs_pidfile_update] 0-glusterfsd: pidfile /var/run/gluster/shd/gv0/gv0-shd.pid updated with pid 3560
[2022-01-14 09:15:50.175504 +0000] I [glusterfsd.c:2417:daemonize] 0-glusterfs: Pid of current running process is 3560
[2022-01-14 09:15:50.175527 +0000] D [logging.c:1696:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2022-01-14 09:15:50.177722 +0000] T [rpcsvc.c:2818:rpcsvc_init] 0-rpc-service: rx pool: 8
[2022-01-14 09:15:50.177786 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS
[2022-01-14 09:15:50.177804 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS-v2
[2022-01-14 09:15:50.177817 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS-v3
[2022-01-14 09:15:50.177832 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_UNIX
[2022-01-14 09:15:50.177848 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_NULL
[2022-01-14 09:15:50.177860 +0000] D [rpcsvc.c:2838:rpcsvc_init] 0-rpc-service: RPC service inited.
[2022-01-14 09:15:50.178112 +0000] D [rpcsvc.c:2345:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2022-01-14 09:15:50.178158 +0000] D [rpc-transport.c:278:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/9.3/rpc-transport/socket.so
[2022-01-14 09:15:50.180530 +0000] D [socket.c:4517:socket_init] 0-socket.glusterfsd: disabling nodelay
[2022-01-14 09:15:50.180554 +0000] D [socket.c:4554:socket_init] 0-socket.glusterfsd: Configured transport.tcp-user-timeout=42
[2022-01-14 09:15:50.180568 +0000] D [socket.c:4574:socket_init] 0-socket.glusterfsd: Reconfigured transport.keepalivecnt=9
[2022-01-14 09:15:50.180827 +0000] I [socket.c:928:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2022-01-14 09:15:50.180981 +0000] T [socket.c:221:socket_dump_info] 0-socket.glusterfsd: $$$ client: listening on (af:1,sock:7) /var/run/gluster/4a9d2fde0c960d85.socket non-SSL (errno:0:Success)
[2022-01-14 09:15:50.181507 +0000] D [rpcsvc.c:2345:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Brick operations, Num: 4867634, Ver: 2, Port: 0
[2022-01-14 09:15:50.181543 +0000] D [rpc-clnt.c:1012:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
[2022-01-14 09:15:50.181560 +0000] D [rpc-clnt.c:1024:rpc_clnt_connection_init] 0-glusterfs: disable ping-timeout
[2022-01-14 09:15:50.181588 +0000] D [rpc-transport.c:278:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/9.3/rpc-transport/socket.so
[2022-01-14 09:15:50.181698 +0000] D [MSGID: 101233] [options.c:974:xl_opt_validate] 0-glusterfs: option is deprecated, continuing with correction [{key=address-family}, {preferred=transport.address-family}]
[2022-01-14 09:15:50.181758 +0000] T [MSGID: 0] [options.c:80:xlator_option_validate_int] 0-glusterfs: no range check required for 'option remote-port 24007'
[2022-01-14 09:15:50.181789 +0000] D [socket.c:4554:socket_init] 0-glusterfs: Configured transport.tcp-user-timeout=42
[2022-01-14 09:15:50.181802 +0000] D [socket.c:4574:socket_init] 0-glusterfs: Reconfigured transport.keepalivecnt=9
[2022-01-14 09:15:50.181822 +0000] D [rpc-clnt.c:1585:rpcclnt_cbk_program_register] 0-glusterfs: New program registered: GlusterFS Callback, Num: 52743234, Ver: 1
[2022-01-14 09:15:50.181839 +0000] T [rpc-clnt.c:390:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2022-01-14 09:15:50.181856 +0000] T [socket.c:3372:socket_connect] 0-glusterfs: connecting 0x5640de61fc30, sock=-1
[2022-01-14 09:15:50.182243 +0000] D [dict.c:1248:data_to_uint16] (-->/usr/lib64/glusterfs/9.3/rpc-transport/socket.so(+0x9a95) [0x7fe641a90a95] -->/usr/lib64/glusterfs/9.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x2a3) [0x7fe641a905c3] -->/lib64/libglusterfs.so.0(data_to_uint16+0x146) [0x7fe646284896] ) 0-dict: key null, unsigned integer type asked, has integer type [Invalid argument]
[2022-01-14 09:15:50.182285 +0000] T [MSGID: 0] [common-utils.c:504:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: localhost
[2022-01-14 09:15:50.185048 +0000] E [MSGID: 101075] [common-utils.c:519:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]
[2022-01-14 09:15:50.185159 +0000] E [name.c:264:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host localhost
[2022-01-14 09:15:50.185567 +0000] I [glusterfsd-mgmt.c:2640:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2022-01-14 09:15:50.185592 +0000] I [glusterfsd-mgmt.c:2662:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2022-01-14 09:15:50.185797 +0000] I [MSGID: 101190] [event-epoll.c:669:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}]
[2022-01-14 09:15:50.185866 +0000] I [MSGID: 101190] [event-epoll.c:669:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
[2022-01-14 09:15:50.186147 +0000] D [logging.c:1666:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2022-01-14 09:15:50.186169 +0000] D [logging.c:1672:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
[2022-01-14 09:15:50.186241 +0000] W [glusterfsd.c:1428:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(+0x10e0d) [0x7fe646238e0d] -->/usr/sbin/glusterfs(+0x1480a) [0x5640dd4e680a] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x5640dd4dd424] ) 0-: received signum (1), shutting down
[2022-01-14 09:15:50.186277 +0000] D [mgmt-pmap.c:90:rpc_clnt_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given
[2022-01-14 09:15:50.186292 +0000] T [MSGID: 0] [glusterfsd.c:2164:glusterfs_pidfile_cleanup] 0-glusterfsd: pidfile /var/run/gluster/shd/gv0/gv0-shd.pid cleanup
Thank you very much!
@idefxH I tried removing the ::1 line and it still worked fine on my machine. Do you have the following line in your /etc/hosts?
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
This shouldn't have happened:
[2022-01-14 09:15:50.185048 +0000] E [MSGID: 101075] [common-utils.c:519:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]
Thanks!
Well, I had separate lines for localhost and localhost.localdomain, and nothing for localhost4*; I replaced everything by the line you proposed, on all nodes, then rebooted everything, and it still doesn't work.
I noticed I still had the ::1 address for localhost, I understand I had only partially disable ipv6; I have solved this, now ip addr doesn't show a single ipv6 address....
... Still it doesn't work.
The trace is pretty much unchanged (Cf. underneath).
I don't understand why I get the line you have pinpointed:
[2022-01-14 16:34:28.583336 +0000] T [MSGID: 0] [common-utils.c:504:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: localhost
It seems it is probing ip6 address no matter what I do. Any idea what could explain that? As you know, I have tweaked the standard OpenSUSE package precisely to remove ipv6, I only changed one line int the spec, but maybe I should have changed something else so that the Self Heal uses ipv4 instead of ipv6?
Here is the trace:
[2022-01-14 16:34:28.574546 +0000] I [MSGID: 100030] [glusterfsd.c:2681:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=9.3}, {cmdlinestr=/usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6}]
[2022-01-14 16:34:28.574708 +0000] T [MSGID: 0] [glusterfsd.c:557:create_fuse_mount] 0-glusterfsd: mount point not found, not a client process
[2022-01-14 16:34:28.575806 +0000] D [MSGID: 0] [glusterfsd.c:2219:glusterfs_pidfile_update] 0-glusterfsd: pidfile /var/run/gluster/shd/gv0/gv0-shd.pid updated with pid 2983
[2022-01-14 16:34:28.575904 +0000] I [glusterfsd.c:2417:daemonize] 0-glusterfs: Pid of current running process is 2983
[2022-01-14 16:34:28.575928 +0000] D [logging.c:1696:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5
[2022-01-14 16:34:28.578202 +0000] T [rpcsvc.c:2818:rpcsvc_init] 0-rpc-service: rx pool: 8
[2022-01-14 16:34:28.578280 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS
[2022-01-14 16:34:28.578299 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS-v2
[2022-01-14 16:34:28.578313 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_GLUSTERFS-v3
[2022-01-14 16:34:28.578325 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_UNIX
[2022-01-14 16:34:28.578336 +0000] T [rpcsvc-auth.c:122:rpcsvc_auth_init_auth] 0-rpc-service: Authentication enabled: AUTH_NULL
[2022-01-14 16:34:28.578347 +0000] D [rpcsvc.c:2838:rpcsvc_init] 0-rpc-service: RPC service inited.
[2022-01-14 16:34:28.578670 +0000] D [rpcsvc.c:2345:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0
[2022-01-14 16:34:28.578733 +0000] D [rpc-transport.c:278:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/9.3/rpc-transport/socket.so
[2022-01-14 16:34:28.581538 +0000] D [socket.c:4517:socket_init] 0-socket.glusterfsd: disabling nodelay
[2022-01-14 16:34:28.581576 +0000] D [socket.c:4554:socket_init] 0-socket.glusterfsd: Configured transport.tcp-user-timeout=42
[2022-01-14 16:34:28.581590 +0000] D [socket.c:4574:socket_init] 0-socket.glusterfsd: Reconfigured transport.keepalivecnt=9
[2022-01-14 16:34:28.581780 +0000] I [socket.c:928:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2022-01-14 16:34:28.582136 +0000] T [socket.c:221:socket_dump_info] 0-socket.glusterfsd: $$$ client: listening on (af:1,sock:7) /var/run/gluster/4a9d2fde0c960d85.socket non-SSL (errno:0:Success)
[2022-01-14 16:34:28.582594 +0000] D [rpcsvc.c:2345:rpcsvc_program_register] 0-rpc-service: New program registered: Gluster Brick operations, Num: 4867634, Ver: 2, Port: 0
[2022-01-14 16:34:28.582656 +0000] D [rpc-clnt.c:1012:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins
[2022-01-14 16:34:28.582674 +0000] D [rpc-clnt.c:1024:rpc_clnt_connection_init] 0-glusterfs: disable ping-timeout
[2022-01-14 16:34:28.582705 +0000] D [rpc-transport.c:278:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/9.3/rpc-transport/socket.so
[2022-01-14 16:34:28.582778 +0000] D [MSGID: 101233] [options.c:974:xl_opt_validate] 0-glusterfs: option is deprecated, continuing with correction [{key=address-family}, {preferred=transport.address-family}]
[2022-01-14 16:34:28.582822 +0000] T [MSGID: 0] [options.c:80:xlator_option_validate_int] 0-glusterfs: no range check required for 'option remote-port 24007'
[2022-01-14 16:34:28.582855 +0000] D [socket.c:4554:socket_init] 0-glusterfs: Configured transport.tcp-user-timeout=42
[2022-01-14 16:34:28.582868 +0000] D [socket.c:4574:socket_init] 0-glusterfs: Reconfigured transport.keepalivecnt=9
[2022-01-14 16:34:28.582889 +0000] D [rpc-clnt.c:1585:rpcclnt_cbk_program_register] 0-glusterfs: New program registered: GlusterFS Callback, Num: 52743234, Ver: 1
[2022-01-14 16:34:28.582907 +0000] T [rpc-clnt.c:390:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect
[2022-01-14 16:34:28.582925 +0000] T [socket.c:3372:socket_connect] 0-glusterfs: connecting 0x559b8eec4c30, sock=-1
[2022-01-14 16:34:28.583296 +0000] D [dict.c:1248:data_to_uint16] (-->/usr/lib64/glusterfs/9.3/rpc-transport/socket.so(+0x9a95) [0x7f2375440a95] -->/usr/lib64/glusterfs/9.3/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x2a3) [0x7f23754405c3] -->/lib64/libglusterfs.so.0(data_to_uint16+0x146) [0x7f2379c34896] ) 0-dict: key null, unsigned integer type asked, has integer type [Invalid argument]
[2022-01-14 16:34:28.583336 +0000] T [MSGID: 0] [common-utils.c:504:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: localhost
[2022-01-14 16:34:28.586596 +0000] E [MSGID: 101075] [common-utils.c:519:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]
[2022-01-14 16:34:28.586667 +0000] E [name.c:264:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host localhost
[2022-01-14 16:34:28.587363 +0000] I [glusterfsd-mgmt.c:2640:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: localhost
[2022-01-14 16:34:28.587388 +0000] I [glusterfsd-mgmt.c:2662:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2022-01-14 16:34:28.587795 +0000] I [MSGID: 101190] [event-epoll.c:669:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=1}]
[2022-01-14 16:34:28.587898 +0000] D [logging.c:1666:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages
[2022-01-14 16:34:28.587922 +0000] D [logging.c:1672:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages
[2022-01-14 16:34:28.588004 +0000] W [glusterfsd.c:1428:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(+0x10e0d) [0x7f2379be8e0d] -->/usr/sbin/glusterfs(+0x1480a) [0x559b8e13e80a] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x559b8e135424] ) 0-: received signum (1), shutting down
[2022-01-14 16:34:28.588042 +0000] D [mgmt-pmap.c:90:rpc_clnt_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given
[2022-01-14 16:34:28.588058 +0000] T [MSGID: 0] [glusterfsd.c:2164:glusterfs_pidfile_cleanup] 0-glusterfsd: pidfile /var/run/gluster/shd/gv0/gv0-shd.pid cleanup
[2022-01-14 16:34:28.588464 +0000] I [MSGID: 101190] [event-epoll.c:669:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}]
/* Make sure we are not vulnerable to someone setting
* net.ipv6.bindv6only to 1 so that gluster services are
* available over IPv4 & IPv6.
*/
#ifdef IPV6_DEFAULT
int disable_v6only = 0;
if (setsockopt(priv->sock, IPPROTO_IPV6, IPV6_V6ONLY,
(void *)&disable_v6only, sizeof(disable_v6only)) < 0) {
gf_log(this->name, GF_LOG_WARNING,
"Error disabling sockopt IPV6_V6ONLY: \"%s\"",
strerror(errno));
}
#endif
I see the following lines and comment where this log comes. Basically getaddrinfo of localhost is not getting ipv4 address, that is the issue. Rest of glusterfs doesn't use localhost. Only shd uses it, which is why you are observing it only with shd.
You are right, this explains the issue I mentioned in the comment I erased :), where I had Error disabling sockopt IPV6_V6ONLY
But with the binary compiled without --with-ipv6-default
, I don’t get this message.
I have made a simple test for getaddrinfo on localhost, compiled it and ran it on the machine that gets the issue:
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
int main()
{
struct addrinfo hints;
struct addrinfo* feed_server = NULL;
memset(&hints, 0, sizeof(struct addrinfo));
hints.ai_family = AF_INET;
getaddrinfo("localhost", NULL, &hints, &feed_server);
struct addrinfo *res;
for(res = feed_server; res != NULL; res = res->ai_next){
struct sockaddr_in* saddr = (struct sockaddr_in*)res->ai_addr;
printf("hostname: %s\n", inet_ntoa(saddr->sin_addr));
}
return 0;
}
the output is as follows:
hostname: 127.0.0.1
hostname: 127.0.0.1
hostname: 127.0.0.1
Is there something that I’m missing?
Thank you soo much for your help!
@idefxH do an strace of the command line to see what exactly is happening on the syscalls, you can capture it with -o
strace -ff -o <output-filename> /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6
Maybe this will show what is happening on the system.
Hello, thanks again.
I looked into it, but I couldn’t find anything obvious.
You’ll find the 700+ lines here : https://file.io/spV4LGjpBr8Z
Thanks !
@idefxH Maybe hint is somehow becoming ipv6. Could you put a breakpoint on af_inet_client_get_remote_sockaddr()
and check what is happening with sockaddr->sa_family
from the time it enters this function till it calls gf_resolve_ip6()?
Use gdb /usr/bin/glusterfs
On gdb prompt, do:
(gdb) r -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
i.e. add -N
at the end
Ok, will do, but this time I think I should grab an actual computer and not just do ssh from my phone… so it may take a little bit longer :)
Anyway, will it work with normal rpm OpenSuse package or should I rebuild them with debug options?
(so far, I get Reading symbols from /sbin/glusterfs... (No debugging symbols found in /sbin/glusterfs)
I don't know how debug symbols are installed on suse. In fedora when you attach the process to gdb, it gives you the command to install the debug info rpms
Ok, solved the question of having the debug symbols.
but struggling to debug:
Function "af_inet_client_get_remote_sockaddr()" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
here is the whole output. I get the same issue wether I add the breakpoint before or after calling the binary with the suggested arguments.
gdb /sbin/glusterfs
GNU gdb (GDB; openSUSE Tumbleweed) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.opensuse.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /sbin/glusterfs...
Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd-9.3-95.1.x86_64.debug...
(gdb) b af_inet_client_get_remote_sockaddr()
Function "af_inet_client_get_remote_sockaddr()" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (af_inet_client_get_remote_sockaddr()) pending.
(gdb) r -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
Starting program: /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6730640 (LWP 3757)]
[New Thread 0x7ffff5f2f640 (LWP 3758)]
[New Thread 0x7ffff572e640 (LWP 3759)]
[New Thread 0x7ffff4f2d640 (LWP 3760)]
[New Thread 0x7ffff472c640 (LWP 3761)]
[New Thread 0x7ffff3f2b640 (LWP 3762)]
[New Thread 0x7ffff3600640 (LWP 3763)]
[New Thread 0x7ffff2dff640 (LWP 3764)]
[New Thread 0x7ffff25fe640 (LWP 3765)]
[Thread 0x7ffff572e640 (LWP 3759) exited]
[Thread 0x7ffff3f2b640 (LWP 3762) exited]
[Thread 0x7ffff472c640 (LWP 3761) exited]
[Thread 0x7ffff4f2d640 (LWP 3760) exited]
[Thread 0x7ffff5f2f640 (LWP 3758) exited]
[Thread 0x7ffff6730640 (LWP 3757) exited]
[Thread 0x7ffff766c5c0 (LWP 3752) exited]
[Thread 0x7ffff2dff640 (LWP 3764) exited]
[Thread 0x7ffff25fe640 (LWP 3765) exited]
[Inferior 1 (process 3752) exited with code 01]
(gdb) b
No default breakpoint address now.
Anything I’m doing wrong ?
Thanks !
The only difference between your invocation and mine is double quotes around *replicate*
part of the command line. Apart from that I had to kill the existing glustershd process which is running before I did this.
root@PP-CNPZ5M2 ~# gdb /usr/local/sbin/glusterfs
GNU gdb (GDB) Fedora 11.1-5.fc35
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/glusterfs...
(gdb) b af_inet_client_get_remote_sockaddr
Function "af_inet_client_get_remote_sockaddr" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (af_inet_client_get_remote_sockaddr) pending.
(gdb) r -s localhost --volfile-id shd/r3 -p /var/run/gluster/shd/r3/r3-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/ca8777576062483a.socket --xlator-option "*replicate*.node-uuid=644f8c6d-f223-477e-bc73-22bd4d4636c5" --process-name glustershd --client-pid=-6 -N
Starting program: /usr/local/sbin/glusterfs -s localhost --volfile-id shd/r3 -p /var/run/gluster/shd/r3/r3-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/ca8777576062483a.socket --xlator-option "*replicate*.node-uuid=644f8c6d-f223-477e-bc73-22bd4d4636c5" --process-name glustershd --client-pid=-6 -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffe8fc6640 (LWP 172469)]
[New Thread 0x7fffe87c5640 (LWP 172470)]
[New Thread 0x7fffe7cc4640 (LWP 172471)]
[New Thread 0x7fffe74c3640 (LWP 172472)]
[New Thread 0x7fffe6cc2640 (LWP 172473)]
Thread 1 "glusterfs" hit Breakpoint 1, af_inet_client_get_remote_sockaddr (this=0xcd0b28, sockaddr=0x7fffffffd3f0, sockaddr_len=0x7fffffffd480) at /home/pranith.karampuri/workspace/glusterfs/rpc/rpc-transport/socket/src/name.c:214
214 dict_t *options = this->options;
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-11.fc35.x86_64 gperftools-libs-2.9.1-2.fc35.x86_64 keyutils-libs-1.6.1-3.fc35.x86_64 krb5-libs-1.19.2-2.fc35.x86_64 libcom_err-1.46.3-1.fc35.x86_64 libgcc-11.2.1-7.fc35.x86_64 libselinux-3.3-1.fc35.x86_64 libstdc++-11.2.1-7.fc35.x86_64 libtirpc-1.3.2-1.fc35.x86_64 libuuid-2.37.2-1.fc35.x86_64 openssl-libs-1.1.1l-2.fc35.x86_64 pcre2-10.37-4.fc35.x86_64 userspace-rcu-0.13.0-3.fc35.x86_64 zlib-1.2.11-30.fc35.x86_64
(gdb)
Hello, Couldn't add breakpoint for af_inet_client_get_remote_sockaddr (same issue all over again).
But I added a breakpoint to gf_resolve_ip6, and judging by the stack, it is called by af_inet_client_get_remote_sockaddr. The address family provided as parameter is 10, which means AF_INET6 (Cf. socket.h: #define AF_INET6 10 /* IP version 6 */
)
(gdb) bt
#0 gf_resolve_ip6 (hostname=hostname@entry=0x5555555f97b0 "localhost", port=port@entry=24007, family=10, dnscache=dnscache@entry=0x555555623938, addr_info=addr_info@entry=0x7fffffffd2b8)
at /usr/src/debug/glusterfs-9.3-1.3.x86_64/libglusterfs/src/common-utils.c:476
#1 0x00007ffff36f2603 in af_inet_client_get_remote_sockaddr (sockaddr_len=0x7fffffffd35c, sockaddr=0x7fffffffd370, this=0x5555556238c0)
at /usr/src/debug/glusterfs-9.3-95.1.x86_64/rpc/rpc-transport/socket/src/name.c:261
#2 socket_client_get_remote_sockaddr (this=this@entry=0x5555556238c0, sockaddr=sockaddr@entry=0x7fffffffd370, sockaddr_len=sockaddr_len@entry=0x7fffffffd35c, sa_family=sa_family@entry=0x7fffffffd35a)
at /usr/src/debug/glusterfs-9.3-95.1.x86_64/rpc/rpc-transport/socket/src/name.c:522
#3 0x00007ffff36f2a95 in socket_connect (this=0x5555556238c0, port=0) at /usr/src/debug/glusterfs-9.3-95.1.x86_64/rpc/rpc-transport/socket/src/socket.c:3375
#4 0x00007ffff7e994c8 in rpc_clnt_reconnect (conn_ptr=conn_ptr@entry=0x5555556236d0) at /usr/src/debug/glusterfs-9.3-1.3.x86_64/rpc/rpc-lib/src/rpc-clnt.c:391
#5 0x00007ffff7e9959a in rpc_clnt_start (rpc=rpc@entry=0x5555556236a0) at /usr/src/debug/glusterfs-9.3-1.3.x86_64/rpc/rpc-lib/src/rpc-clnt.c:1159
#6 0x00005555555675a2 in glusterfs_mgmt_init (ctx=ctx@entry=0x5555555ab8a0) at /usr/src/debug/glusterfs-9.3-95.1.x86_64/glusterfsd/src/glusterfsd-mgmt.c:2887
#7 0x0000555555567772 in glusterfs_volumes_init (ctx=ctx@entry=0x5555555ab8a0) at /usr/src/debug/glusterfs-9.3-95.1.x86_64/glusterfsd/src/glusterfsd.c:2558
#8 0x000055555555c893 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/glusterfs-9.3-95.1.x86_64/glusterfsd/src/glusterfsd.c:2728
In the source, I noted there is a call to inet_pton that can set the address family to AF_INET6 just before the call to gf_resolve_ip6, so I checked it:
(gdb) b inet_pton
Function "inet_pton" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (inet_pton) pending.
(gdb) r -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
Starting program: /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6730640 (LWP 3176)]
[New Thread 0x7ffff5f2f640 (LWP 3177)]
[New Thread 0x7ffff572e640 (LWP 3178)]
[New Thread 0x7ffff4f2d640 (LWP 3179)]
[New Thread 0x7ffff472c640 (LWP 3180)]
[New Thread 0x7ffff3f2b640 (LWP 3181)]
Thread 1 "glusterfs" hit Breakpoint 1, __GI___inet_pton (af=af@entry=10, src=src@entry=0x5555555f97b0 "localhost", dst=dst@entry=0x7fffffffd2c0) at inet_pton.c:68
68 {
(gdb) fin
Run till exit from #0 __GI___inet_pton (af=af@entry=10, src=src@entry=0x5555555f97b0 "localhost", dst=dst@entry=0x7fffffffd2c0) at inet_pton.c:68
0x00007ffff36f25e3 in af_inet_client_get_remote_sockaddr (sockaddr_len=0x7fffffffd35c, sockaddr=0x7fffffffd370, this=0x5555556238c0) at /usr/src/debug/glusterfs-9.3-95.1.x86_64/rpc/rpc-transport/socket/src/name.c:255
255 /usr/src/debug/glusterfs-9.3-95.1.x86_64/rpc/rpc-transport/socket/src/name.c: Directory not empty.
Value returned is $1 = 0
Not very much experiences with gdb, but I understand inet_pton returns as false, which means this is not what sets the address family to AF_INET6.
==> We have to look at the socket_connect stack, to try to understand how it happens to set address as AF_INET6 where:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 02:11:32:20:da:a2 brd ff:ff:ff:ff:ff:ff
altname enp0s3
altname ens3
inet 192.168.0.71/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
option transport.address-family inet
in /etc/glusterfs/glusterd.volnet.ipv4.ip_forward=1
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
ipv6.disable=1
grub command line:
BOOT_IMAGE=/boot/vmlinuz-5.16.0-1-default root=UUID=3b866b5f-4e8c-4664-a835-f101c162a488 rd.timeout=60 rd.retry=45 quiet systemd.show_status=yes console=ttyS1,115200 console=tty0 net.ifnames=0 ignition.platform.id=qemu security=selinux selinux=1,systemd.unified_cgroup_hierarchy=1i ipv6.disable=1
Do you see anything that could explains this?
Thanks a lot!
Cool, this confirms my hypothesis. As per the code and considering that there are no logs from the function client_fill_address_family()
Maybe the option coming in is inet6?
Could you check what is happening in rpc_transport_inet_options_build()
in gdb? That is where the address family is getting picked for this transport as per code and my understanding of the code..
I guess the call would be :
ret = rpc_transport_inet_options_build(options, host, port,
(opt ? opt->value : NULL));
Following
opt = find_xlator_option_in_cmd_args_t("address-family", cmd_args);
What GDB tells:
(gdb) b rpc_transport_inet_options_build
Breakpoint 1 at 0x7860
(gdb) r -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
Starting program: /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6730640 (LWP 6625)]
[New Thread 0x7ffff5f2f640 (LWP 6626)]
[New Thread 0x7ffff572e640 (LWP 6627)]
[New Thread 0x7ffff4f2d640 (LWP 6628)]
[New Thread 0x7ffff472c640 (LWP 6629)]
[New Thread 0x7ffff3f2b640 (LWP 6630)]
Thread 1 "glusterfs" hit Breakpoint 1, rpc_transport_inet_options_build (dict=dict@entry=0x555555601c68, hostname=hostname@entry=0x5555555e89a0 "localhost", port=port@entry=24007, af=0x0) at /usr/src/debug/glusterfs-9.3-1.3.x86_64/rpc/rpc-lib/src/rpc-transport.c:631
==> seems af is set to NULL, which would mean that at that stage, cluster doesn’t consider there is a command line argument requesting INET instead of INET6 (which is true, but there happen to be a setting requesting this in /etc/glusterfs/glusterd.vol)
I have tried to look at the *addr_family var, but it is optimized out. I am recompiling without optimization to be able to check this value. I'll keep you posted :)
Well, I can't make it work. Once again, I'm not a gdb expert, but I have recompiled everything without optimization, e.g. here is the compilation command used for rpc-transport.c
[ 106s] make[4]: Entering directory '/home/abuild/rpmbuild/BUILD/glusterfs-9.3/rpc/rpc-lib/src'
[ 106s] /bin/sh ../../../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../.. -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DGF_LINUX_HOST_OS -include ../../../config.h -include ../../../site.h -I../../../libglusterfs/src -I../../../libglusterfs/src -I../../../libglusterfs/src -I../../../rpc/xdr/src -I../../../rpc/xdr/src -DRPC_TRANSPORTDIR=\"/usr/lib64/glusterfs/9.3/rpc-transport\" -I../../../contrib/rbtree -Wall -I/usr/include/uuid -Wformat -Werror=format-security -Werror=implicit-function-declaration -flto -g -O0 -Wall -Wextra -c -o rpc-transport.lo rpc-transport.c
[ 106s] libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../.. -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DGF_LINUX_HOST_OS -include ../../../config.h -include ../../../site.h -I../../../libglusterfs/src -I../../../libglusterfs/src -I../../../libglusterfs/src -I../../../rpc/xdr/src -I../../../rpc/xdr/src -DRPC_TRANSPORTDIR=\"/usr/lib64/glusterfs/9.3/rpc-transport\" -I../../../contrib/rbtree -Wall -I/usr/include/uuid -Wformat -Werror=format-security -Werror=implicit-function-declaration -flto -g -O0 -Wall -Wextra -c rpc-transport.c -fPIC -DPIC -o .libs/rpc-transport.o
[ 106s] In file included from ../../../libglusterfs/src/glusterfs/dict.h:18,
[ 106s] from rpc-transport.h:59,
[ 106s] from rpc-transport.c:17:
[ 106s] ../../../libglusterfs/src/glusterfs/common-utils.h: In function 'gf_time_fmt_tv':
[ 106s] ../../../libglusterfs/src/glusterfs/common-utils.h:851:21: warning: comparison of integer expressions of different signedness: 'int' and 'size_t' {aka 'long unsigned int'} [-Wsign-compare]
[ 106s] 851 | if (len >= sz_dst - pos)
[ 106s] | ^~
[ 106s] make[4]: Leaving directory '/home/abuild/rpmbuild/BUILD/glusterfs-9.3/rpc/rpc-lib/src'
Still, after update and reboot, GDB still can't trace the addr_family var:
(gdb) b rpc_transport_inet_options_build
Breakpoint 1 at 0x7840
(gdb) r -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
Starting program: /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6 -N
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff6730640 (LWP 4680)]
[New Thread 0x7ffff5f2f640 (LWP 4681)]
[New Thread 0x7ffff572e640 (LWP 4682)]
[New Thread 0x7ffff4f2d640 (LWP 4683)]
[New Thread 0x7ffff472c640 (LWP 4684)]
[New Thread 0x7ffff3f2b640 (LWP 4685)]
Thread 1 "glusterfs" hit Breakpoint 1, rpc_transport_inet_options_build (dict=0x555555603c38, hostname=0x5555555ea9a0 "localhost", port=24007, af=0x0) at /usr/src/debug/glusterfs-9.3-1.3.x86_64/rpc/rpc-lib/src/rpc-transport.c:631
631 GF_ASSERT(hostname);
(gdb) p addr_family
$1 = <optimized out>
==> Do you know what I should do to be able to trace the value of this var? Do we really need it or is there another way to know which part of the code is executed?
#ifdef IPV6_DEFAULT
static char *addr_family = "inet6";
#else
static char *addr_family = "inet";
#endif
Thanks!
I wonder if the rpm was built correctly, if you are anyway doing source compilation, why not just log it and be done?
Something like this?
21:10:41 :) ⚡ git diff
diff --git a/rpc/rpc-lib/src/rpc-transport.c b/rpc/rpc-lib/src/rpc-transport.c
index a6e201a9b3..059998f781 100644
--- a/rpc/rpc-lib/src/rpc-transport.c
+++ b/rpc/rpc-lib/src/rpc-transport.c
@@ -628,6 +628,7 @@ rpc_transport_inet_options_build(dict_t *dict, const char *hostname, int port,
static char *addr_family = "inet";
#endif
+ gf_log(THIS->name, GF_LOG_INFO, "Address family: %s", addr_family);
GF_ASSERT(hostname);
GF_ASSERT(port >= 1024);
GF_VALIDATE_OR_GOTO("rpc-transport", dict, out);
Gives this log on my machine:
[2022-01-16 15:43:25.798870 +0000] I [rpc-transport.c:631:rpc_transport_inet_options_build] 0-glusterfs: Address family: inet
If this log doesn't come, then we definitely know we missed something at the compilation stage.
Hum, I'm building using OpenSUSE Build Service, therefore, it's not being built on my machine.
I don't have a machine set up for dev / compilation. I will set it up. Meanwhile, you can already check the log of the build: https://build.opensuse.org/public/build/home:idefx:branches:openSUSE:Factory/openSUSE_Tumbleweed/x86_64/glusterfs/_log
This summary may be useful:
[ 73s] GlusterFS configure summary
[ 73s] ===========================
[ 73s] FUSE client : yes
[ 73s] epoll IO multiplex : yes
[ 73s] fusermount : yes
[ 73s] readline : yes
[ 73s] georeplication : yes
[ 73s] Linux-AIO : yes
[ 73s] Linux-io_uring : yes
[ 73s] Enable Debug : no
[ 73s] Run with Valgrind : no
[ 73s] Sanitizer enabled : none
[ 73s] Use syslog : yes
[ 73s] XML output : yes
[ 73s] Unit Tests : no
[ 73s] Track priv ports : yes
[ 73s] POSIX ACLs : yes
[ 73s] SELinux features : yes
[ 73s] firewalld-config : no
[ 73s] Events : yes
[ 73s] EC dynamic support : x64 sse avx
[ 73s] Use memory pools : yes
[ 73s] Nanosecond m/atimes : yes
[ 73s] Server components : yes
[ 73s] Legacy gNFS server : no
[ 73s] IPV6 default : no
[ 73s] Use TIRPC : yes
[ 73s] With Python : 3.8
[ 73s] Cloudsync : yes
[ 73s] Metadata dispersal : no
[ 73s] Link with TCMALLOC : no
[ 73s] Enable Brick Mux : no
[ 73s] Building with LTO : yes
Hello, rebuilt locally from sources, and now it seems to work just fine!
As I haven't changed anything compared to the built I previously made using the OpenSUSE Build Service, my assumption is that I didn't installed all packages from the rebuilt I made on this service, therefore I may have had some inconsistency between the libs from Gluster and the actual glusterfs packages; some being built with " --with-ipv6-default" and some not. (e.g. libgfrpc, libgfxdr, libglusterfs, libglusterd, etc...)
Having rebuilt all on my computer got me to reinstall all rpm packages instead of just requesting for installation of specific glusterfs package.
Sorry for taking so much of your time. You've been incredibly helpful!
Thanks again!
No issues, closing the issue.
Description of problem: On a brand new / vanilla Gluster cluster, Self Healing doesn't work, apparently, the Self-heal daemon is not running. Could you please help me with this?
Thanks!
Environment:
OpenSuse Tumbleweed, ipv4 only set-up:
because the OpenSuse package Tumbleweed is compiled for ipv6 by default, I had to recompile it using the source package; all I changed vs the standard package is replacing the line
--disable-static --with-ipv6-default
with--disable-static
in the "glusterfs.spec" file. Version is 9.3.After installation, to complete the "ipv4 only" set up, I have added the following line in the /etc/glusterfs/glusterd.vol:
option transport.address-family inet
I have 3 nodes: STORAGE1, STORAGE2 and ARBITER. I want 3 replicas (including the arbiter)
The exact command to reproduce the issue:
Everything works fine at that stage. I can mount my gv0 volume, and it behaves as expected. But when a node reboots, there is no healing happening. Even if I try to force it.
The full output of the command that failed:
Mandatory info: - The output of the
gluster volume info
command:- The output of the
gluster volume status
command:- The output of the
gluster volume heal
command:*- Provide logs present on following locations of client and server nodes - from /var/log/glustershd.log, I picked the following cmdline: `[2022-01-13 15:42:22.815406 +0000] I [MSGID: 100030] [glusterfsd.c:2681:main] 0-/usr/sbin/glusterfs: Started running version [{arg=/usr/sbin/glusterfs}, {version=9.3}, {cmdlinestr=/usr/sbin/glusterfs -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6}]`
And launched it myself, adding a little bit more log:
> /usr/sbin/glusterfs -L TRACE -s localhost --volfile-id shd/gv0 -p /var/run/gluster/shd/gv0/gv0-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/4a9d2fde0c960d85.socket --xlator-option *replicate*.node-uuid=083e6bde-fac3-4421-babd-2e58ccd67a24 --process-name glustershd --client-pid=-6
From there, here is what I found in /var/log/glustershd.log:
Additional info:
- The operating system / glusterfs version: OpenSuse tumbleweed / glusterfs 9.3 Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration