Open dtaht opened 1 year ago
lqos@lqos:/opt/libreqos/src/rust$ sudo strace -p 1895 strace: Process 1895 attached futex(0x7f6981e181e0, FUTEX_WAIT_PRIVATE, 1, NULL) = 0 accept4(48, {sa_family=AF_UNIX}, [110 => 2], SOCK_CLOEXEC|SOCK_NONBLOCK) = 50 epoll_ctl(5, EPOLL_CTL_ADD, 50, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1325400065, u64=1325400065}}) = 0 futex(0x7f69807fd4e0, FUTEX_WAKE_PRIVATE, 1) = 1 accept4(48, 0x7ffe82d444f0, [110], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
This is NOT a high priority bug. I have a personal scar of not doing the right thing with EAGAIN regarding the deployment of a new (java based) webserver, which under a production workload would leak sockets at a slow rate, spinning ever more, exactly like that, until it ran out of sockets, and an individual instance would crash after about 3 hours, and need to be restarted. Tracing it back to where it leaked the socket took some effort.
And several hundred servers in the total deployment, simultaneously crashing every few minutes.
It looks like the socket fd went away, and it is not responding to EAGAIN, so instead of sleeping on the futex or the epoll it loops. It is not rapid however, and does seem to go away a few minutes after the client does.