Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
568 stars 152 forks source link

segfault at sys_readv () from /lib64/libvma.so #969

Open syspro4 opened 2 years ago

syspro4 commented 2 years ago

Subject

segfault at sys_readv () from /lib64/libvma.so

Issue type

Configuration:

Actual behavior:

While running LD_PRELOAD with glusterd (GlusterFS) I see a segfault at sys_readv(). I enabled debug mode while compiling but I do not able to see the exact crash location inside libvma code. Following is the command I used for configuring debug build.

[root@dev-mc libvma]# ./configure --with-ofed=/usr --prefix=/usr --libdir=/usr/lib64 --includedir=/usr/include --docdir=/usr/share/doc/libvma --sysconfdir=/etc --enable-debug

Crash:

0 0x00007f92909093f0 in sys_readv () from /lib64/libvma.so

1 0x00007f919ecc7217 in __socket_ssl_readv (this=this@entry=0x7f9194004570, opvector=opvector@entry=0x7f9194004d08, opcount=opcount@entry=1) at socket.c:568

2 0x00007f919ecc74ea in __socket_cached_read (opcount=1, opvector=0x7f9194004d08, this=0x7f9194004570) at socket.c:652

3 __socket_rwv (this=this@entry=0x7f9194004570, vector=, count=count@entry=1, pending_vector=pending_vector@entry=0x7f9194004d48, pending_count=pending_count@entry=0x7f9194004d54, bytes=bytes@entry=0x0, write=0) at socket.c:734

4 0x00007f919ecc84ab in __socket_readv (bytes=0x0, pending_count=0x7f9194004d54, pending_vector=0x7f9194004d48, count=1, vector=, this=0x7f9194004570) at socket.c:2354

5 __socket_proto_state_machine (this=this@entry=0x7f9194004570, pollin=pollin@entry=0x7f919dfa8ef0) at socket.c:2354

6 0x00007f919eccbda4 in socket_proto_state_machine (pollin=0x7f919dfa8ef0, this=0x7f9194004570) at socket.c:2542

7 socket_event_poll_in (notify_handled=true, this=0x7f9194004570) at socket.c:2542

8 socket_event_handler (event_thread_died=0 '\000', poll_err=, poll_out=, poll_in=, data=0x7f9194004570, gen=1, idx=2, fd=56) at socket.c:2948

9 socket_event_handler (fd=fd@entry=56, idx=idx@entry=2, gen=gen@entry=1, data=data@entry=0x7f9194004570, poll_in=, poll_out=, poll_err=, event_thread_died=0 '\000') at socket.c:2868

10 0x00007f92903099dc in event_dispatch_epoll_handler (event=0x7f919dfa8f94, event_pool=0x5598f07fab20) at event-epoll.c:692

11 event_dispatch_epoll_worker (data=0x5598f0e03170) at event-epoll.c:803

Expected behavior:

libvma should not segfault while running with GlusterFS.

Steps to reproduce:

  1. Start glusterd in foreground with LD_PRELOAD:

    LD_PRELOAD=libvma.so /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO -N

  2. Run gluster cli command to configure gluster volume

    gluster volume info

  3. After running the cli command the glusterd gets a segfault.
igor-ivanov commented 2 years ago

Hello @syspro4 Thank you for reporting the issue. I think that this issue might happen because of symbol sys_readv conflict. It exists in glusterfs and libvma

glusterfs: https://github.com/gluster/glusterfs/blob/2ff6e2d5e217ab555ff63026017151edf2ba1adf/rpc/rpc-transport/socket/src/socket.c#L557

libvma: https://github.com/Mellanox/libvma/blob/bcc367040bea42b605e057fd5c95e6a7cd22c49f/src/vma/lwip/tcp_out.c#L123

Solution will be planned.

syspro4 commented 2 years ago

Thanks for the reply! I will change the gluster code and replace glusterfs->sys_readv to new_sys_readv() and try to use libvma.

syspro4 commented 2 years ago

I fixed gluserfs->sys_readv to new_sys_readv() and now I can start glusterd with libvma. But now it fails to spawn new process (glusterfsd). glusterfsd is a daemon process which does actual IOs to the underlying file system. Does libvma supports fork/execvp() system call?

In the log I see following error messages:

[2021-11-29 22:37:07.547610 +0000] I [glusterfsd.c:2418:daemonize] 0-glusterfs: Pid of current running process is 6511 [2021-11-29 22:37:10.928985 +0000] I [socket.c:929:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 103 [2021-11-29 22:37:10.929176 +0000] E [MSGID: 101187] [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}] [2021-11-29 22:37:10.929196 +0000] W [socket.c:3779:socket_listen] 0-socket.glusterfsd: could not register socket 102 with events; closing socket [2021-11-29 22:37:10.929218 +0000] W [rpcsvc.c:1993:rpcsvc_create_listener] 0-rpc-service: listening on transport failed

Thanks

igor-ivanov commented 2 years ago

Nice to see that sys_readv issue can be overcome. libvma supports fork()/exec() case. See https://github.com/Mellanox/libvma/commit/24bd1737c6a9966c894be5c441f6a28f1cd3924e and related test as https://github.com/Mellanox/libvma/tree/master/tests/simple_fork VMA_TRACELEVEL=4 can be used to display VMA output.

syspro4 commented 2 years ago

Thanks for the reply. But I am getting error while running the Gluster services (glusterd & glusterfsd) in demonize mode while using libvma. I always get same error:

[event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]

Is it possible that while forking()/execing() some FDs are getting closed & hence the epoll_ctl(,EPOLL_CTL_ADD, fd, ) call is failing?

igor-ivanov commented 2 years ago
  1. I would like to inform that current master should not have symbol conflict initially reported.
  2. About https://github.com/Mellanox/libvma/issues/969#issuecomment-983247807 Do you know if Gluster application uses flow described at https://github.com/Mellanox/libvma/issues/816? Could you try VMA_TRACELEVEL=4 and see suspicuos VMA output around [event-epoll.c:429:event_register_epoll] 0-epoll: failed to add fd to epoll [{fd=102}, {epoll_fd=52}, {errno=9}, {error=Bad file descriptor}]