chriskohlhoff / asio

Asio C++ Library
http://think-async.com/Asio
4.98k stars 1.22k forks source link

epoll_wait() returning too many file descriptors #653

Open ghost opened 3 years ago

ghost commented 3 years ago

@skysley commented on Nov 19, 2018, 3:21 PM UTC:

Hi, I am not sure whether the root cause is in the boost::asio or in the implementation of epoll_wait(). However, it may be good to handle the case described below. I am using boost 1.67.0.

I have a program which spawns a lot of sub-processes and a lot of threads. Communication is basically done via boost's signals and slots and via boost::interprocess::message_queue. While I was looking for a memory leak, I encountered a strange behavior. In epoll_reactor.ipp within epoll_reactor::run() one can find this code:

  epoll_event events[128];
  int num_events = epoll_wait(epoll_fd_, events, 128, timeout);

The number of file descriptors should be limited to 128. However, I encountered num_events to be 232 (see attached screenshot). Until now, I haven't got an idea why this happens. However, it would be helpful if it would be checked (at least via an assertion) that nume_events is not larger than 128.

backtrace.txt

Screenshot: epoll_wait_result

This issue was moved by chriskohlhoff from boostorg/asio#168.

drok commented 1 year ago

FWIW, the "232 events" returned from epoll_wait is driving me nuts too. I don't use boost, but uSockets library, with epoll as the underlying loop. My app is much simpler (though still far too big to be a test case), and this issue happens spuriously while testing with 15-18 file descriptors, being a mix of eventfd, timerfd and sockets.

Originally the uSockets code had a workaround to ignore these spurious events. I found this issue when I tried to remove the workaround to better understand why it's needed. I'm no closer, but I suspect the kernel. The "232" events are returned with about a 1 in 10 probability, and I've been unable to find any correlation so far.

My kernel (vzkernel-2.6.32-042stab120.18.x86_64) is: Linux builder-48 2.6.32-042stab120.18 #1 SMP Fri Jan 13 10:32:04 MSK 2017 x86_64 x86_64 x86_64 GNU/Linux

I'm adding this report in the hopes the bug could be correlated to some known bug in this (ancient) kernel. Maybe it was fixed in later versions, and I'd like to add a "required min kernel version" to my app, and remove the unexplained workaround.