axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.7k stars 393 forks source link

io_uring_wait_cqe_timeout() passes a __kernel_timespec to IORING_OP_TIMEOUT which takes a timespec64 #1161

Open lewissbaker opened 1 month ago

lewissbaker commented 1 month ago

The io_uring_wait_cqe_timeout() function takes a pointer to a __kernel_timespec which holds the value of the timeout.

If IORING_FEAT_EXT_ARG feature is not set then this function falls back to being implemented in terms of __io_uring_submit_timeout() which tries to submit an IORING_OP_TIMEOUT request initialized by the io_uring_prep_timeout() function. This in turn sets the addr field of the SQE to point to the __kernel_timespec structure.

However, the documentation for the IORING_OP_TIMEOUT op-code says that the addr field must point to a timespec64 structure.

Looking at the linux headers that define these types I find:

// include/uapi/linux/time_types.h
struct __kernel_timespec  {
  __kernel_time64_t tv_sec; // __kernel_time64_t is defined as 'long long'
  long long         tv_nsec;
};

whereas I see that:

// include/linux/time64.h
struct timespec64 {
  time64_t tv_sec;   // time64_t is __s64 which is either 'long' or 'long long' depending on arch
  long     tv_nsec;
};

On some 32-bit platforms, however, the long type of the timespec64 structure is only 32-bits and the long long type of the __kernel_timespec structure is 64-bits, meaning that the layouts of these structures can differ on some platforms.

I gather it "just works" on both 64-bit platforms (where long has the same layout as long long) and also on little-endian 32-bit platforms because the __kernel_timespec's tv_nsec field usually only has a value in the range [0..999'999'999] and thus only populates the lower 32-bits of the 64-bit value and so when a 32-bit value is read from the timespec64's tv_nsec field then it just picks up the right value.

However, I think on 32-bit big-endian architectures this could lead to the tv_nsec part being ignored as the read of the timespec64's 32-bit tv_nsec field would be reading the higher 32-bits of the 64-bit __kernel_timespec tv_nsec field, which is pretty much always going to be zero.

This could potentially lead to code that, say, passes a timeout of 500ms to io_uring_wait_cqe_timeout() having this interpreted as if they passed a 0ms timeout and and always completing immediately - potentially leading to high CPU usage.

I'm not sure how to address this inconsistency, though, as the code-path that passes an io_uring_getevents_arg structure to io_uring_enter2 still needs to pass the address of a __kernel_timespec structure and so it is difficult to fix this API to be correct for both paths.

Ideally both code-paths would be able to use the same time structure.

Would it be possible to have the submit/wait timeout functions "rewrite" the passed __kernel_timespec structure to have a timespec64 value and then "restore" the original __kernel_timespec value before returning if the IORING_OP_TIMEOUT code-path is taken and liburing detects that the layouts of timespec64 and __kernel_timespec differ?

isilence commented 1 month ago

I don't believe struct timespec64 is even a thing in the user space, hah? Looking it up, OP_TIMEOUT expects struct __kernel_timespec as well, so it should be just a doc problem.