axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.89k stars 407 forks source link

[GIT PULL] man/io_uring_enter: Add missing Op codes and some minor other additions #1225

Closed CPestka closed 2 months ago

CPestka commented 2 months ago

This adds documentation for the following missing Op codes:

  1. IORING_OP_GETXATTR
  2. IORING_OP_SETXATTR
  3. IORING_OP_FGETXATTR
  4. IORING_OP_FSETXATTR
  5. IORING_OP_BIND
  6. IORING_OP_LISTEN
  7. IORING_OP_FTRUNCATE
  8. IORING_OP_SENDMSG_ZC
  9. IORING_OP_READ_MULTISHOT
  10. IORING_OP_FUTEX_WAIT
  11. IORING_OP_FUTEX_WAITV
  12. IORING_OP_FUTEX_WAKE

It also adds documentation about:

  1. the undocumented ability to sync file ranges rather than the whole file with IORING_OP_FSYNC see #990
  2. the "level" and "edge triggered" behavior of IORING_OP_POLL_ADD see #1014
  3. how to enforce ordering between requests after the remark in IORING_OP_FSYNC how one can not rely on ordering per default

As it seems there are no man pages for the new futex syscalls yet (they are normally done by glibc right?) I'm not very sure that I got everything right regarding these syscalls.

Two Ops that are still missing now are IORING_OP_FIXED_FD_INSTALL and IORING_OP_URING_CMD. IORING_OP_URING_CMD should maybe have its own man page? Currently it is only nvme that is using it, but i supposed the description of it could grow large over time. (As a side note liburing helper functions for nvme cmds would be neat :) ) Regarding IORING_OP_FIXED_FD_INSTALL I wasn't entirely sure how it works. It is apparently used to take "ownership" (installing it in the reg file table and dealing with the file refcount) of a registered file of another ring, but I wasn't sure how the passing of the file struct is actually happening. Does that have smth to do with MSG rings?

On another side note: IORING_OP_FIXED_FD_INSTALL, IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN seem to be missing in the enum in tools/include/uapi/linux/io_uring.h, right?


git request-pull output:

The following changes since commit 0fe5c09195c0918f89582dd6ff098a58a0bdf62a:

  configure: fix ublk_cmd header check (2024-09-06 15:54:04 -0600)

are available in the Git repository at:

  https://github.com/CPestka/liburing io_uring_enter_man

for you to fetch changes up to c6a303d012bf453cf2dc5f593ed80814715d511b:

  man/io_uring_enter: Adds docs about the capability to fsync file ranges (2024-09-07 16:28:49 +0200)

----------------------------------------------------------------
CPestka (11):
      man/io_uring_enter: Add Docs for missing xattr related OP codes
      man/io_uring_enter: Add docs for io_uring Op code for bind(2)
      man/io_uring_enter: Adds doc for io_uring OP code of listen(2)
      man/io_uring_enter: Adds docs for io_uring op for ftruncate(2)
      man/io_uring_enter: Add docs for zero copy version of sendmsg op
      man/io_uring_enter: Add docs for multishoot variant of read OP
      man/io_uring_enter: Add doc for io_uring version of futex_wait(2)
      man/io_uring_enter: Adds docs for io_uring version of futex_wake(2)
      man/io_uring_enter: Add docs for the io_uring version of futex_waitv(2)
      man/io_uring_enter: Added remark about ordering of SQE/CQEs
      man/io_uring_enter: Adds docs about the capability to fsync file ranges

 man/io_uring_enter.2 | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)

Click to show/hide pull request guidelines ## Pull Request Guidelines 1. To make everyone easily filter pull request from the email notification, use `[GIT PULL]` as a prefix in your PR title. ``` [GIT PULL] Your Pull Request Title ``` 13. Follow the commit message format rules below. 14. Follow the Linux kernel coding style (see: https://github.com/torvalds/linux/blob/master/Documentation/process/coding-style.rst). ### Commit message format rules: 1. The first line is title (don't be more than 72 chars if possible). 5. Then an empty line. 6. Then a description (may be omitted for truly trivial changes). 15. Then an empty line again (if it has a description). 16. Then a `Signed-off-by` tag with your real name and email. For example: ``` Signed-off-by: Foo Bar ``` The description should be word-wrapped at 72 chars. Some things should not be word-wrapped. They may be some kind of quoted text - long compiler error messages, oops reports, Link, etc. (things that have a certain specific format). Note that all of this goes in the commit message, not in the pull request text. The pull request text should introduce what this pull request does, and each commit message should explain the rationale for why that particular change was made. The git tree is canonical source of truth, not github. Each patch should do one thing, and one thing only. If you find yourself writing an explanation for why a patch is fixing multiple issues, that's a good indication that the change should be split into separate patches. If the commit is a fix for an issue, add a `Fixes` tag with the issue URL. Don't use GitHub anonymous email like this as the commit author: ``` 123456789+username@users.noreply.github.com ``` Use a real email address! ### Commit message example: ``` src/queue: don't flush SQ ring for new wait interface If we have IORING_FEAT_EXT_ARG, then timeouts are done through the syscall instead of by posting an internal timeout. This was done to be both more efficient, but also to enable multi-threaded use the wait side. If we touch the SQ state by flushing it, that isn't safe without synchronization. Fixes: https://github.com/axboe/liburing/issues/402 Signed-off-by: Jens Axboe ```

By submitting this pull request, I acknowledge that:

  1. I have followed the above pull request guidelines.
  2. I have the rights to submit this work under the same license.
  3. I agree to a Developer Certificate of Origin (see https://developercertificate.org for more information).
axboe commented 2 months ago

Thanks for doing this!

IORING_OP_URING_CMD should maybe have its own man page? Currently it is only nvme that is using it

I think it should have its own man page, but it's not true that only nvme is using it. Yes it's used for nvme passthrough, but if you look at the socket op methods for get/setsockopt, then those use the uring_cmd interface as well. And that part will only grow. You can consider OP_URING_CMD as a way of doing per-file type private operations, kind of like ioctls (except async).

IORING_OP_URING_CMD should maybe have its own man page? Currently it is only nvme that is using it

It takes a registered file index and installs it in the normal process file table, turning it into a file descriptor you can use with regular syscalls as well. It remains a registered descriptor as well.

CPestka commented 2 months ago

It takes a registered file index and installs it in the normal process file table, turning it into a file descriptor you can use with regular syscalls as well. It remains a registered descriptor as well.

Oh, that makes a lot of sense. I was wondering a while back how one would e.g. call smth like getpeername() on a connection made with accept using registered fds. I guess that's the answer ^^

CPestka commented 2 months ago

Btw it seems like the ability to pass the new "non-registered" fd back to the user is implemented in receive_fd(), but not wired up to the IORING_OP_FIXED_FD_INSTALL, which always passes it NULL for the user space ptr. Is that an oversight or intentional?

axboe commented 2 months ago

Btw it seems like the ability to pass the new "non-registered" fd back to the user is implemented in receive_fd(), but not wired up to the IORING_OP_FIXED_FD_INSTALL, which always passes it NULL for the user space ptr. Is that an oversight or intentional?

That's intentional - the userptr doesn't exist on the io_uring side. If you look at receive_fd(), it both returns the fd value where the regular file descriptor was installed, and copies it to a userspace address, if one is given. io_uring just uses the former.