axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

Unable to achieve deterministic behaviour when polling a server socket #596

Closed fredrikwidlund closed 3 weeks ago

fredrikwidlund commented 2 years ago

I am trying make poll semantics work before accept() on a tcp socket.

Try 1

#include <stdio.h>
#include <assert.h>
#include <errno.h>
#include <limits.h>
#include <poll.h>
#include <netinet/in.h>
#include <liburing.h>

int main()
{
  struct io_uring ring;
  struct io_uring_sqe *sqe;
  struct io_uring_cqe *cqe;
  unsigned head;
  int s, c, count;

  assert(io_uring_queue_init(2048, &ring, 0) == 0);
  s = socket(AF_INET, SOCK_STREAM, 0);
  assert(s >= 0);
  assert(setsockopt(s, SOL_SOCKET, SO_REUSEPORT, (int[]) {1}, sizeof(int)) == 0);
  assert(setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (int[]) {1}, sizeof(int)) == 0);
  assert(bind(s, (struct sockaddr *) (struct sockaddr_in[]) {{.sin_family = AF_INET, .sin_port = htons(12345)}},
              sizeof(struct sockaddr_in)) == 0);
  assert(listen(s, INT_MAX) == 0);
  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_poll_add(sqe, s, POLLIN);
  sqe->len = IORING_POLL_ADD_MULTI;
  while (1)
  {
    io_uring_submit_and_wait(&ring, 1);
    count = 0;
    io_uring_for_each_cqe(&ring, head, cqe)
    {
      assert(cqe->res > 0);
      c = accept(s, NULL, NULL);
      assert(c >= 0);
      printf("accepted socket %d\n", c);
      count++;
    }
    io_uring_cq_advance(&ring, count);
  }
}

I then connect a few clients as such...

#!/bin/bash

exec 5<>"/dev/tcp/127.0.0.1/12345"
exec 6<>"/dev/tcp/127.0.0.1/12345"
exec 7<>"/dev/tcp/127.0.0.1/12345"
exec 8<>"/dev/tcp/127.0.0.1/12345"
sleep 10

I expect all connections to succeed but there seems to be a race and the number of successful connections vary between 1 and 4. I use a a standard accept() syscall just to simplify the test case, but regardless a cqe is not always generated when there are connections still waiting to be accepted.

I then figure I might mimic epoll edge trigged semantics instead, and repeat accept() in nonblocking mode until EAGAIN.

Try 2

#include <assert.h>
#include <errno.h>
#include <limits.h>
#include <netinet/in.h>
#include <liburing.h>

int main()
{
  struct io_uring ring;
  struct io_uring_sqe *sqe;
  int fd;

  assert(io_uring_queue_init(2048, &ring, 0) == 0);
  fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);
  assert(fd >= 0);
  assert(setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, (int[]) {1}, sizeof(int)) == 0);
  assert(setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (int[]) {1}, sizeof(int)) == 0);
  assert(bind(fd, (struct sockaddr *) (struct sockaddr_in[]) {{.sin_family = AF_INET, .sin_port = htons(12345)}},
              sizeof(struct sockaddr_in)) == 0);
  assert(listen(fd, INT_MAX) == 0);
  assert(accept(fd, NULL, NULL) == -1 && errno == EAGAIN);
  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_accept(sqe, fd, NULL, NULL, 0);
  io_uring_submit_and_wait(&ring, 1);
}

The test unexpectedly hangs waiting for connections. Even though the socket is in nonblocking mode it does not return with EAGAIN when there are no connections waiting to be accepted.

How is the poll interface supposed to used in this case? Is this behaviour expected?

Linux arch 5.18.1-1-aarch64-ARCH #1 SMP PREEMPT Wed Jun 1 19:34:45 MDT 2022 aarch64 GNU/Linux

isilence commented 2 years ago

I expect all connections to succeed but there seems to be a race and the number of successful connections vary between 1 and 4. I use a a standard accept() syscall just to simplify the test case, but regardless a cqe is not always generated when there are connections still waiting to be accepted.

I then figure I might mimic epoll edge trigged semantics instead, and repeat accept() in nonblocking mode until EAGAIN.

Right, there is no guarantee for 1:1 match b/w poll completions and accepts pending

isilence commented 2 years ago

The test unexpectedly hangs waiting for connections. Even though the socket is in nonblocking mode it does not return with EAGAIN when there are no connections waiting to be accepted.

I guess that's just another place we didn't care about O_NONBLOCK.

isilence commented 2 years ago

Even though not particularly a solution to the API problem, but to narrow down the problem I'd ask whether something like a draft pseudo-code below works for you?

while (1) {
    add_poll_req();
    io_uring_submit_wait();
    cqe = get_cqe();
    // error handling
    while (accept(O_NONBLOCK) == SUCCESS) {}
}
fredrikwidlund commented 2 years ago

I would like to avoid using regular syscalls if possible. Being able to submit requests to uring flagged as "non blocking", or "complete immediately" would work, although honouring O_NONBLOCK is perhaps more stringent?