axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.72k stars 393 forks source link

read_multi EAGAIN #1106

Closed MikuChan03 closed 3 months ago

MikuChan03 commented 4 months ago

Hello!

When using io_uring_prep_read_multishot to read from a timerfd, io_uring_wait_cqe waits until the timerfd elapses and then returns EAGAIN in cqe->res. Using io_uring_prep_read_multishot to read from a pipe works. Using io_uring_prep_read to read from a timerfd works. I'm using only the basic functions.


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <liburing.h>
#include <sys/timerfd.h>

struct bufRing {
  struct io_uring_buf_ring * uringBufRing;
  int id;
  char * bufMem;
  int bufCount;
  int bufSize;
  int bufMask;
};

int bufRingCreate(struct io_uring * uring, struct bufRing * bufRing, int bufCount, int bufSize, int id){
  struct io_uring_buf_ring * uringBufRing;
  char * bufMem;
  int rc;

  uringBufRing = io_uring_setup_buf_ring(uring, bufCount, id, 0, &rc);
  if(uringBufRing == NULL){
    fprintf(stderr, "io_uring_setup_buf_ring: %s\n", strerror(-rc));
    return -1;
  }

  bufMem = malloc(bufCount * bufSize);
  if(bufMem == NULL){
    perror("malloc");
    io_uring_free_buf_ring(uring, uringBufRing, bufCount, id);
    return -1;
  }

  bufRing->uringBufRing = uringBufRing;
  bufRing->id = id;
  bufRing->bufMem = bufMem;
  bufRing->bufCount = bufCount;
  bufRing->bufSize = bufSize;
  bufRing->bufMask = bufCount - 1;

  for(int i = 0; i < bufCount; i++){
    io_uring_buf_ring_add(uringBufRing, &bufMem[bufSize * i], bufSize, i, bufCount - 1, i);
  }
  io_uring_buf_ring_advance(uringBufRing, bufCount);

  return 0;
}

void bufRingReturnBuf(struct bufRing * bufRing, int index){
  io_uring_buf_ring_add(bufRing->uringBufRing, &bufRing->bufMem[bufRing->bufSize * index], bufRing->bufSize,
                        index, bufRing->bufMask, 0);
  io_uring_buf_ring_advance(bufRing->uringBufRing, 1);
}

void bufRingFree(struct io_uring * uring, struct bufRing * bufRing){
  io_uring_free_buf_ring(uring, bufRing->uringBufRing, bufRing->bufCount, bufRing->id);
  free(bufRing->bufMem);
}

int timerFdSettimeGood(int timerFd, int ms){
  struct itimerspec timesSpec = {
    .it_value = {
      .tv_sec = ms / 1'000,
      .tv_nsec = (ms % 1'000) * 1'000'000
    }
  };
  int rc;

  rc = timerfd_settime(timerFd, 0, &timesSpec, NULL);
  if(rc == -1){
    perror("timerfd_settime");
  }

  return rc;
}

int main(){
  struct io_uring uring;
  struct bufRing bufRing;
  struct io_uring_sqe * sqe;
  struct io_uring_cqe * cqe;
  int timerFd;
  int fds[2];
  int rc;

  timerFd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC);
  timerFdSettimeGood(timerFd, 1000);

  pipe(fds);
  write(fds[1], "jasadf dsaf aewf wae fewa fdsa fdsaf dsa fsaf dsa faef waef ewa fewa fwae fawefwaef wae fewa fewa f", 70);

  rc = io_uring_queue_init(10, &uring, 0);
  if(rc < 0){
    fprintf(stderr, "io_uring_queue_init: %s\n", strerror(-rc));
    return 1;
  }

  rc = bufRingCreate(&uring, &bufRing, 256, 256, 123);
  if(rc < 0){
    return 7;
  }

  sqe = io_uring_get_sqe(&uring);
  io_uring_prep_read_multishot(sqe, timerFd, 0, -1, bufRing.id);
  io_uring_sqe_set_data64(sqe, 42);
  io_uring_submit(&uring);

  sqe = io_uring_get_sqe(&uring);
  io_uring_prep_read_multishot(sqe, fds[0], 0, -1, bufRing.id);
  io_uring_sqe_set_data64(sqe, 43);
  io_uring_submit(&uring);

  while(1){
    rc = io_uring_wait_cqe(&uring, &cqe);
    if(rc < 0){
      fprintf(stderr, "io_uring_submit: %s\n", strerror(-rc));
      return 1;
    }

    if(io_uring_cqe_get_data64(cqe) == 42){
      printf("timerfd operation\n");
    }else{
      printf("pipe operation\n");
    }
    printf("res %i flags %X\n\n", cqe->res, cqe->flags);

    io_uring_cqe_seen(&uring, cqe);
  }

  bufRingFree(&uring, &bufRing);
  io_uring_queue_exit(&uring);

  return 0;
}
pipe operation
res 70 flags 3

timerfd operation
res -11 flags 0

Thanks.

axboe commented 4 months ago

I suspect this one:

https://git.kernel.dk/cgit/linux/commit/?h=io_uring-6.9&id=0a3737db8479b77f95f4bfda8e71b03c697eb56a

will fix it, it just went upstream today and should find its way into the stable kernels shortly.

MikuChan03 commented 3 months ago

@axboe Senpai, just so you know. After building the newest kernel from your linux-block repo, we don't get any cqe for reading from the timerfd at all. It's true that the mentioned patch makes it so the EGAIN error disappears, but instead we get nothing at all.

axboe commented 3 months ago

Took a quick look, and this is actually a bug in timerfd. It doesn't deal with nonblocking reads at all, properly. For any type of read, it'll always attempt to wait, even though there's data to be read. We can work around this in io_uring, but I'll fix the timerfd side as well as that is just not acceptable.

axboe commented 3 months ago

Here's the timerfd fix:

https://lore.kernel.org/linux-fsdevel/b4059ed0-5567-44e7-95f7-f7e4b227501c@kernel.dk/

axboe commented 3 months ago

And here are the io_uring fixes. This will make it work independently of the timerfd fix, but multishot mode cannot be support with timerfd not properly supporting nonblocking reads. Once the timerfd patch is in, multishot timerfd reads will work just fine:

axboe@m2max-kvm ~> ./timerfd
pipe operation
res 70 flags 3

timerfd operation
res 8 flags 10003

timerfd operation
res 8 flags 20003

timerfd operation
res 8 flags 30003

but before that and with just the below io_uring fixes, timerfd reading will be singleshot. In other words, it will never have IORING_CQE_F_MORE set, and will require re-arming every time.

https://lore.kernel.org/io-uring/20240401175306.1051122-1-axboe@kernel.dk/

axboe commented 3 months ago

Neglected to mention, multishot will work even with the limited timerfd support if you just create the timerfd with TFD_NONBLOCK. That'll work regardless of whether or not the timerfd patch is included in the kernel. So at least for now it can be a suitable work-around for dealing with timerfds in multishot mode.

MikuChan03 commented 3 months ago

Thank you such much! I'll say, since even a successful nonblocking read operation on a timerfd always sets EAGAIN and that throws off aio stuff, does that mean that as of today, no one has used timerfd in an aio context where the errno matters? That would be scawwy ._.

axboe commented 3 months ago

It'll work fine without multishot, which is a fairly new addition on the read side. Even without the timerfd fix for proper non-block reading, it won't block and it'll return the right result. It'll just be less efficient as it needs to go through io-wq, rather than being purely poll triggered. No results are being lost. FWIW, io_uring/liburing doesn't use errno at all, errors are returned directly. errno doesn't work properly for async IO.

axboe commented 3 months ago

Marking as closed, fix is queued and will go upstream later this week and bubble back to stable. Hopefully for 6.10 I'll have the timerfd fix in as well.