Segmentation fault in gluster client

SowjanyaKotha commented 12 months ago

Description of problem: Setup of 2 node mirrored volumes with clients installed on both nodes. When one of the node becomes faulty, the node is removed and replaced with a new node with the same name/IP. While adding brick, the active client crashes. The issue occurs randomly when ssl is enabled on IO. It is not seen in non-ssl setups.

The exact command to reproduce the issue: gluster volume add-brick efa_logs replica 2 10.18.120.135:/apps/opt/efa/logs force

The full output of the command that failed:

**Expected results:** add-brick should be successful **Mandatory info:** **- The output of the `gluster volume info` command**: ``` Status of volume: efa_certs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.18.120.136:/apps/opt/efa/certs 52847 0 Y 34686 Brick 10.18.120.135:/apps/opt/efa/certs 54321 0 Y 33999 Self-heal Daemon on localhost N/A N/A Y 150192 Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015 Task Status of Volume efa_certs ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: efa_logs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.18.120.136:/apps/opt/efa/logs 56910 0 Y 34750 Brick 10.18.120.135:/apps/opt/efa/logs 56796 0 Y 34064 Self-heal Daemon on localhost N/A N/A Y 150192 Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015 Task Status of Volume efa_logs ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: efa_misc Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.18.120.136:/apps/opt/efa/misc 55691 0 Y 34799 Brick 10.18.120.135:/apps/opt/efa/misc 58871 0 Y 34167 Self-heal Daemon on localhost N/A N/A Y 150192 Self-heal Daemon on 10.18.120.135 N/A N/A Y 34015 Task Status of Volume efa_misc ------------------------------------------------------------------------------ There are no active volume tasks ``` **- The output of the `gluster volume status` command**: ``` Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.18.120.135:/apps/opt/efa/logs Brick2: 10.18.120.136:/apps/opt/efa/logs Options Reconfigured: ssl.ca-list: /apps/efadata/glusterfs/glusterfs.extreme-ca-chain.pem ssl.own-cert: /apps/efadata/glusterfs/glusterfs.pem ssl.private-key: /apps/efadata/glusterfs/glusterfs.key.pem ssl.cipher-list: HIGH:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1:TLSv1.2:!3DES:!RC4:!aNULL:!ADH auth.ssl-allow: 10.18.120.135,10.18.120.136 server.ssl: on client.ssl: on ssl.certificate-depth: 3 network.ping-timeout: 2 performance.open-behind: on cluster.favorite-child-policy: mtime storage.owner-gid: 1001 storage.owner-uid: 0 cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off ``` **- The output of the `gluster volume heal` command**: **- Provide logs present on following locations of client and server nodes - /var/log/glusterfs/ **- Is there any crash ? Provide the backtrace and coredump ``` (gdb) bt #0 0x00007fa6f731bbad in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #1 0x00007fa6f731fe1e in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #2 0x00007fa6f731d6d0 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #3 0x00007fa6f7324c45 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #4 0x00007fa6f732fa3f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #5 0x00007fa6f732fb47 in SSL_read () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 #6 0x00007fa6f739dc94 in ssl_do (buf=, len=, func=, priv=, priv=) at socket.c:246 #7 0x00007fa6f739de36 in __socket_ssl_readv (opvector=opvector@entry=0x7fa6f6abedd0, opcount=opcount@entry=1, this=, this=) at socket.c:552 #8 0x00007fa6f739e35b in __socket_ssl_read (count=, buf=, this=0x555685ba1b98) at socket.c:572 #9 __socket_cached_read (opcount=1, opvector=0x555685699338, this=0x555685ba1b98) at socket.c:610 #10 __socket_rwv (this=this@entry=0x555685ba1b98, vector=, count=count@entry=1, pending_vector=pending_vector@entry=0x5556856993a8, pending_count=pending_count@entry=0x5556856993b4, bytes=bytes@entry=0x7fa6f6abeea0, write=0) at socket.c:721 #11 0x00007fa6f73a0438 in __socket_readv (bytes=0x7fa6f6abeea0, pending_count=0x5556856993b4, pending_vector=0x5556856993a8, count=1, vector=, this=0x555685ba1b98) at socket.c:2102 #12 __socket_read_frag (this=0x555685ba1b98) at socket.c:2102 #13 socket_proto_state_machine (pollin=, this=0x555685ba1b98) at socket.c:2262 #14 socket_event_poll_in (notify_handled=true, this=0x555685ba1b98) at socket.c:2384 #15 socket_event_handler (event_thread_died=0, poll_err=0, poll_out=, poll_in=, data=0x555685ba1b98, gen=13, idx=2, fd=) at socket.c:2790 #16 socket_event_handler (fd=fd@entry=6, idx=idx@entry=2, gen=gen@entry=13, data=data@entry=0x555685ba1b98, poll_in=, poll_out=, poll_err=0, event_thread_died=0) at socket.c:2710 #17 0x00007fa6fbade119 in event_dispatch_epoll_handler (event=0x7fa6f6abf054, event_pool=0x555685006018) at event-epoll.c:614 #18 event_dispatch_epoll_worker (data=0x555685036828) at event-epoll.c:725 #19 0x00007fa6fb9fa609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #20 0x00007fa6fb74b133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) f 5 #5 0x00007fa6f732fb47 in SSL_read () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1 (gdb) info locals No symbol table info available. (gdb) f 9 #9 __socket_cached_read (opcount=1, opvector=0x555685699338, this=0x555685ba1b98) at socket.c:610 610 socket.c: No such file or directory. (gdb) info ocals Undefined info command: "ocals". Try "help info". (gdb) info locals ret = -1 priv = 0x555685699218 in = 0x555685699318 req_len = 8 priv = in = req_len = ret = (gdb) l 605 in socket.c (gdb) f 7 #7 0x00007fa6f739de36 in __socket_ssl_readv (opvector=opvector@entry=0x7fa6f6abedd0, opcount=opcount@entry=1, this=, this=) at socket.c:552 552 in socket.c (gdb) info locals priv = 0x555685699218 sock = ret = -1 __FUNCTION__ = "__socket_ssl_readv" (gdb) f 15 #15 socket_event_handler (event_thread_died=0, poll_err=0, poll_out=, poll_in=, data=0x555685ba1b98, gen=13, idx=2, fd=) at socket.c:2790 2790 in socket.c (gdb) l 2785 in socket.c (gdb) info locals this = ret = ctx = notify_handled = priv = 0x555685699218 socket_closed = this = priv = ret = ctx = socket_closed = notify_handled = __FUNCTION__ = "socket_event_handler" sock_type = sa = (gdb) ``` **Additional info:**

- The operating system / glusterfs version: It is reproducible with gluster version 9.6 and 11.0 on Ubuntu setup installed with Debian files.

Note: Please hide any confidential data which you don't want to share in public like IP address, file name, hostname or any other configuration