Mellanox / libvma

Linux user space library for network socket acceleration based on RDMA compatible network adaptors
https://www.mellanox.com/products/software/accelerator-software/vma?mtag=vma
Other
582 stars 153 forks source link

epoll_ctl EPOLL_CTL_MOD called incorrectly #1037

Open bigbohne opened 1 year ago

bigbohne commented 1 year ago

Subject

Running the boost::beast example (https://www.boost.org/doc/libs/1_81_0/libs/beast/example/http/client/async/http_client_async.cpp) with LD_PRELOAD=libvma.so fails

Issue type

Configuration:

Actual behavior:

VMA ERROR: epfd_info:492:mod_fd() failed to modify fd=22 in epoll epfd=20 (errno=2 No such file or directory)

Expected behavior:

"Display of the data received from the HTTP Server"

Steps to reproduce:

bigbohne commented 1 year ago

Digging into the strace of both runs: (with and without VMA)

Socket fd=22 is the socket in question here

less strace_vma.log | grep "epoll_ctl(20"

epoll_ctl(20, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLERR|EPOLLET, data={u32=3, u64=3}}) = 0
epoll_ctl(20, EPOLL_CTL_ADD, 21, {events=EPOLLIN|EPOLLERR, data={u32=21, u64=21}}) = 0
epoll_ctl(20, EPOLL_CTL_ADD, 22, {events=EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, data={u32=22, u64=22}}) = 0
epoll_ctl(20, EPOLL_CTL_DEL, 22, NULL) = 0
epoll_ctl(20, EPOLL_CTL_MOD, 22, {events=EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, data={u32=22, u64=22}}) = -1 ENOENT (No such file or directory)

in the strace from the boost example (without VMA) one can see that the epoll_ctl calls are correctly done: (fd=6 is the socket in question here)

less strace.log | grep "epoll_ctl("

epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLERR|EPOLLET, data={u32=3828702728, u64=94823821696520}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN|EPOLLERR, data={u32=3828702740, u64=94823821696532}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 6, {events=EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET, data={u32=3828704704, u64=94823821698496}}) = 0
epoll_ctl(4, EPOLL_CTL_MOD, 6, {events=EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, data={u32=3828704704, u64=94823821698496}}) = 0
igor-ivanov commented 1 year ago

Probably suspicious place is https://github.com/Mellanox/libvma/blob/master/src/vma/sock/sock-redirect.cpp#L1036-L1045

  1. socket starts connection as offloaded
  2. can not do connect using offload way
  3. marked as non offloaded
  4. close resources for offloaded socket including removing fd from epoll_fd (epoll_ctl(20, EPOLL_CTL_DEL, 22, NULL) = 0)
  5. do connect as non offloaded
  6. epoll_ctl(20, EPOLL_CTL_MOD, 22, {events=EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLET, data={u32=22, u64=22}}) = -1 ENOENT (No such file or directory)