axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.72k stars 393 forks source link

Question: Why can I submit new SQEs when CQ is full with `IORING_FEAT_NODROP`? #1063

Closed ZhenbangYou closed 5 months ago

ZhenbangYou commented 5 months ago

I'm using Ubuntu with WSL2 (Ubuntu 22.04.3 LTS, kernel version: 5.15.133.1-microsoft-standard-WSL2) and liburing 2.1, so I have feature IORING_FEAT_NODROP. According to the man page, when CQ is full, I can't submit new SQEs. However, I was allowed to do this (CQ size was 8, but I was able to submit a million SQEs before reaping any CQE), and the results of these I/O operations are good. I wonder when I will actually get back pressure in SQ. Thanks a lot for answering my question!

Reproduction code:

#include <cassert>
#include <fcntl.h>
#include <iostream>
#include <liburing.h>
#include <string>
#include <vector>

int main() {
  int fd = open("echo.cpp", O_RDONLY); // Open an arbitrary file
  assert(fd >= 0);

  io_uring ring;
  io_uring_queue_init(4, &ring, 0);
  constexpr unsigned kNumReq = 1'000'000;
  constexpr unsigned kLen = 10;
  std::vector<std::string> bufs(kNumReq, std::string(kLen, '\0'));
  for (int i = 0; i < kNumReq; i++) {
    auto sqe = io_uring_get_sqe(&ring);
    assert(sqe);
    io_uring_prep_read(sqe, fd, bufs[i].data(), kLen, 0);
    sqe->user_data = i;
    // I also checked `io_uring_cq_ready`, and the CQ was full after a few submissions
    int res = io_uring_submit(&ring);
    assert(res == 1); // Didn't get `-EBUSY`
  }
  for (int i = 0; i < kNumReq; i++) {
    io_uring_cqe *cqe;
    int res = io_uring_wait_cqe(&ring, &cqe);
    assert(res == 0);
    std::cout << i << ' ' << bufs[cqe->user_data] << std::endl;
    io_uring_cqe_seen(&ring, cqe);
  }
  io_uring_queue_exit(&ring);
}
krisman commented 5 months ago

According to the man page, when CQ is full, I can't submit new SQEs.

AFAIK, IORING_FEAT_NODROP will queue incoming completions if the CQ is full, so you don't lose any CQE if you aren't reaping fast enough. But it doesn't limit the SQ side. Where do you find this in the man page? From a quick look, I didn't see it either in io_uring_setup or io_uring (7).

Usually, you can have way more IO inflight than SQ/CQ entries available as io_uring will queue them

ZhenbangYou commented 5 months ago

@krisman Thank you so much for your answer! I found the following in io_uring_setup(2):

IORING_FEAT_NODROP If this flag is set, io_uring supports almost never dropping completion events. If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the -EBUSY error value, if it can't flush the overflown events to the CQ ring.

I think overflow condition is equal to full CQ (I may be wrong), so I think my io_uring_submit should fail ("attempting to submit more IO will fail"), but it didn't.

ZhenbangYou commented 5 months ago

I printed *(ring.sq.kflags), and it was IORING_SQ_CQ_OVERFLOW. My liburing doesn't support io_uring_cq_has_overflow, but this function works by checking the above flag (see this link).

isilence commented 5 months ago

@ZhenbangYou, it indeed was the behaviour at some point, i.e. disallowing further submissions until overflowed CQEs are handled, but that was removed long time ago as it's just a huge pain to handle, which can be perfectly done in userspace if required. So, if there is a mention of it in docs, the docs should be fixed.

ZhenbangYou commented 5 months ago

@isilence I see! Thank you so much! Unfortunately, I checked the man page from all sources I can find (man7, arch Linux, debian, and man on my machine), in the page for io_uring_setup (specifically, the part of IORING_FEAT_NODROP), none of them shows this update. I totally agree this update makes life way easier.

Btw, is there any recommended source to find the latest manual for io_uring?

YoSTEALTH commented 5 months ago

It might be a good idea to add "Deprecated" comment, warning, ... or whats recommended. If not people will try and use those features