cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.12k stars 605 forks source link

Epoll with sockets created by socketpair() does not work correctly #1239

Closed wkozaczuk closed 1 year ago

wkozaczuk commented 1 year ago

This code from nginx tries to determine if the host supports EPOLLRDHUP:

static void
ngx_epoll_test_rdhup(ngx_cycle_t *cycle)
{
    int                 s[2], events;
    struct epoll_event  ee;

    if (socketpair(AF_UNIX, SOCK_STREAM, 0, s) == -1) {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "socketpair() failed");
        return;
    }

    ee.events = EPOLLET|EPOLLIN|EPOLLRDHUP;

    if (epoll_ctl(ep, EPOLL_CTL_ADD, s[0], &ee) == -1) {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "epoll_ctl() failed");
        goto failed;
    }

    if (close(s[1]) == -1) {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "close() failed");
        s[1] = -1;
        goto failed;
    }

    s[1] = -1;

    events = epoll_wait(ep, &ee, 1, 5000);

    if (events == -1) {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "epoll_wait() failed");
        goto failed;
    }

    if (events) {
        ngx_use_epoll_rdhup = ee.events & EPOLLRDHUP;

    } else {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, NGX_ETIMEDOUT,
                      "epoll_wait() timed out");

It times out on OSv after 5 seconds (see timeout argument to epoll_wait) and we get this error (the 'Operation timed out'):

OSv v0.57.0-75-g14c514da
eth0: 172.18.0.2
Booted up in 172.23 ms
Cmdline: /nginx -c /nginx.conf
2023/06/18 17:10:12 [emerg] 2#25: ioctl(eventfd, FIONBIO) failed (25: Not a tty)
2023/06/18 17:10:12 [alert] 2#25: epoll_wait() timed out (110: Operation timed out)

This is annoying as one cannot really send requests to Nginx until this happens.

So most important is that epoll_wait() never returns with any events. And on paper looking at libc/pipe_buffer.cc and libc/af_local.cc this should work. At least closing one of the sockets should trigger some events. Relatedly the poll() is working as we test it with tst-af-local.cc.

The second less important issue is whether closing the socket will actually trigger EPOLLRDHUP. I see this old commit where we replaced POLLRDHUP with POLLERR|POLLOUT - f6d134bc58e236ed72f764685baa535463d7f3bd.

wkozaczuk commented 1 year ago

I think I may know what needs to be corrected and how but I need confirmation from someone smarter than me. The commit f6d134bc58e236ed72f764685baa535463d7f3bd I am referring to above did change the pipe_buffer to raise POLLERR|POLLOUT instead of POLLRDHUP when trying to write and no receiver is set. This was to make pipe compatible with Linux.

However pipe_buffer is used to implement the AF_LOCAL sockets created with socketpair() and I think they should have the socket, not pipe semantics. So I think in this case the underlying send pipe_buffer should raise the POLLRDHUP event. Here is a possible fix:

diff --git a/libc/af_local.cc b/libc/af_local.cc
index e65e2b11..721d4d99 100644
--- a/libc/af_local.cc
+++ b/libc/af_local.cc
@@ -65,6 +65,7 @@ int af_local::ioctl(u_long cmd, void *data)
 void af_local::init()
 {
     send->attach_sender(this);
+    send->set_no_receiver_event(POLLRDHUP);
     receive->attach_receiver(this);
 }

diff --git a/libc/pipe_buffer.cc b/libc/pipe_buffer.cc
index acd5d0bb..71997d95 100644
--- a/libc/pipe_buffer.cc
+++ b/libc/pipe_buffer.cc
@@ -29,7 +29,7 @@ void pipe_buffer::detach_receiver()
     if (receiver) {
         receiver = nullptr;
         if (sender)
-            poll_wake(sender, POLLERR|POLLOUT);
+            poll_wake(sender, no_receiver_event);
         may_write.wake_all();
     }
 }
@@ -57,7 +57,7 @@ int pipe_buffer::read_events_unlocked()
 int pipe_buffer::write_events_unlocked()
 {
     if (!receiver) {
-        return POLLERR|POLLOUT;
+        return no_receiver_event;
     }
     int ret = 0;
     ret |= q.size() < max_buf ? POLLOUT : 0;
diff --git a/libc/pipe_buffer.hh b/libc/pipe_buffer.hh
index faea483c..626e33fc 100644
--- a/libc/pipe_buffer.hh
+++ b/libc/pipe_buffer.hh
@@ -30,6 +30,9 @@ public:
     void detach_receiver();
     void attach_sender(struct file *f);
     void attach_receiver(struct file *f);
+    void set_no_receiver_event(int event) {
+        this->no_receiver_event = event;
+    }
 private:
     int read_events_unlocked();
     int write_events_unlocked();
@@ -41,6 +44,7 @@ private:
     std::atomic<unsigned> refs = {};
     condvar may_read;
     condvar may_write;
+    int no_receiver_event = POLLERR|POLLOUT;
     friend void intrusive_ptr_add_ref(pipe_buffer* p) {
         p->refs.fetch_add(1, std::memory_order_relaxed);
     }

BTW after better familiarizing myself with epoll I do not think my comments about two separate issues are correct - it is simply one problem.

Relatedly, as I was studying the epoll/poll code I discovered the method void file::wake_epoll(int events) (see fs/vfs/kern_descrip.cc) ignores the events argument completely. Instead, the int file::poll(int events) overridden by most file struct subclasses plays a role in epoll and poll implementation. Why is that?

wkozaczuk commented 1 year ago

Fixed by 47bcd267a7b90ba6fef8daf9fc8fd794399b5f17

wkozaczuk commented 1 year ago

Fixed by https://github.com/cloudius-systems/osv/commit/47bcd267a7b90ba6fef8daf9fc8fd794399b5f17