axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

Question on using `io_uring_prep_poll_add()` and waiting for multiple events #739

Closed dmantipov closed 3 weeks ago

dmantipov commented 1 year ago

I have a question on using io_uring_prep_poll_add() and waiting for multiple events.

There are two descriptors, say tfd and sfd. The first one is returned by timerfd_create() system call, and the timer is expected to tick once a second. The second one is a socket, which is expected to do thousands reads/writes in a second. So, the simple and convenient poll()-based event loop may be:

struct pollfd pfd[2];
pfd[0].fd = tfd;
pfd[0].events = POLLIN;
pfd[0].revents = 0;
pfd[1].fd = sfd;
pfd[1].events = POLLIN | POLLOUT;
pfd[1].revents = 0;
...
while (1) {
  if (poll(pfd, 2, -1) > 0) {
    if (pfd[0].revents & POLLIN)
      /* Handle timer tick */
    if (pfd[1].revents & POLLOUT)
      /* Can write to socket */
    if (pfd[1].revents & POLLIN)
      /* Can read from socket */
  }
}

Using io_uring, it may be:

int ret;
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
...
while (1) {
  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_poll_add(sqe, tfd, POLLIN);
  io_uring_sqe_set_data64(sqe, 0); /* 0 to denote timer */
  ...
  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_poll_add(sqe, sfd, POLLIN | POLLOUT);
  io_uring_sqe_set_data64(sqe, 1); /* 1 to denote socket */
  ...
  ret = io_uring_submit(&ring);

What's next? The straightforward thing like:

  for (i = 0; i < 2; i++) {
    ret = io_uring_wait_cqe(&ring, &cqe);
    if (ret == 0 && cqe) {
      uint64_t data = io_uring_cqe_get_data64(cqe);
      if (data == 0) {
        if (cqe->res & POLLIN)
          /* Handle timer event */
      } else if (data == 1) {
          if (cqe->res & POLLOUT)
        /* Can write to socket */
          if (cqe->res & POLLIN)
        /* Can read from socket */
      } else
        /* Shouldn't happen, error */
      io_uring_cqe_seen(&ring, cqe);
   }
 }

effectively limits the rate of the (outer, i. e. while(1)) loop to one iteration per second (because it waits for both timer and socket events). So what's the proper way to wait and handle CQEs to be sure that 1) socket I/O is processed at the full speed and 2) the loop is never miss the timer event?

DylanZA commented 1 year ago

basically do not wait for both cqes. You want to process each cqe as it happens.

in pseudocode:

  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_poll_add(sqe, tfd, POLLIN);
  io_uring_sqe_set_data64(sqe, 0); /* 0 to denote timer */

  sqe = io_uring_get_sqe(&ring);
  io_uring_prep_poll_add(sqe, sfd, POLLIN | POLLOUT);
  io_uring_sqe_set_data64(sqe, 1); /* 1 to denote socket */

  ret = io_uring_submit(&ring);
while (1) {
  ...
  wait_cqe(&cqe);
  if (cqe->user_data == 0) {
    /* handle timer and resubmit a poll for the timer*/
  } else if (cqe->user_data == 1) {
    /* handle socket and resubmit a poll for the socket*/
  }
  io_uring_cqe_seen(&ring, cqe);

}

By the way you can make this better by: 1) use multishot poll add, which will continue posting polls until the IORING_CQE_F_MORE flag is not set (to reduce the number of submissions). 2) use proper io_uring operations on the socket. This just saves a step from poll -> op.