The io_uring_wait_cqe_timeout() function takes a pointer to a __kernel_timespec which holds the value of the timeout.
If IORING_FEAT_EXT_ARG feature is not set then this function falls back to being implemented in terms of __io_uring_submit_timeout() which tries to submit an IORING_OP_TIMEOUT request initialized by the io_uring_prep_timeout() function. This in turn sets the addr field of the SQE to point to the __kernel_timespec structure.
However, the documentation for the IORING_OP_TIMEOUT op-code says that the addr field must point to a timespec64 structure.
Looking at the linux headers that define these types I find:
// include/uapi/linux/time_types.h
struct __kernel_timespec {
__kernel_time64_t tv_sec; // __kernel_time64_t is defined as 'long long'
long long tv_nsec;
};
whereas I see that:
// include/linux/time64.h
struct timespec64 {
time64_t tv_sec; // time64_t is __s64 which is either 'long' or 'long long' depending on arch
long tv_nsec;
};
On some 32-bit platforms, however, the long type of the timespec64 structure is only 32-bits and the long long type of the __kernel_timespec structure is 64-bits, meaning that the layouts of these structures can differ on some platforms.
I gather it "just works" on both 64-bit platforms (where long has the same layout as long long) and also on little-endian 32-bit platforms because the __kernel_timespec's tv_nsec field usually only has a value in the range [0..999'999'999] and thus only populates the lower 32-bits of the 64-bit value and so when a 32-bit value is read from the timespec64's tv_nsec field then it just picks up the right value.
However, I think on 32-bit big-endian architectures this could lead to the tv_nsec part being ignored as the read of the timespec64's 32-bit tv_nsec field would be reading the higher 32-bits of the 64-bit __kernel_timespectv_nsec field, which is pretty much always going to be zero.
This could potentially lead to code that, say, passes a timeout of 500ms to io_uring_wait_cqe_timeout() having this interpreted as if they passed a 0ms timeout and and always completing immediately - potentially leading to high CPU usage.
I'm not sure how to address this inconsistency, though, as the code-path that passes an io_uring_getevents_arg structure to io_uring_enter2 still needs to pass the address of a __kernel_timespec structure and so it is difficult to fix this API to be correct for both paths.
Ideally both code-paths would be able to use the same time structure.
Would it be possible to have the submit/wait timeout functions "rewrite" the passed __kernel_timespec structure to have a timespec64 value and then "restore" the original __kernel_timespec value before returning if the IORING_OP_TIMEOUT code-path is taken and liburing detects that the layouts of timespec64 and __kernel_timespec differ?
I don't believe struct timespec64 is even a thing in the user space, hah? Looking it up, OP_TIMEOUT expects struct __kernel_timespec as well, so it should be just a doc problem.
The
io_uring_wait_cqe_timeout()
function takes a pointer to a__kernel_timespec
which holds the value of the timeout.If
IORING_FEAT_EXT_ARG
feature is not set then this function falls back to being implemented in terms of__io_uring_submit_timeout()
which tries to submit anIORING_OP_TIMEOUT
request initialized by theio_uring_prep_timeout()
function. This in turn sets theaddr
field of the SQE to point to the__kernel_timespec
structure.However, the documentation for the
IORING_OP_TIMEOUT
op-code says that theaddr
field must point to atimespec64
structure.Looking at the linux headers that define these types I find:
whereas I see that:
On some 32-bit platforms, however, the
long
type of thetimespec64
structure is only 32-bits and thelong long
type of the__kernel_timespec
structure is 64-bits, meaning that the layouts of these structures can differ on some platforms.I gather it "just works" on both 64-bit platforms (where
long
has the same layout aslong long
) and also on little-endian 32-bit platforms because the__kernel_timespec
'stv_nsec
field usually only has a value in the range [0..999'999'999] and thus only populates the lower 32-bits of the 64-bit value and so when a 32-bit value is read from thetimespec64
'stv_nsec
field then it just picks up the right value.However, I think on 32-bit big-endian architectures this could lead to the
tv_nsec
part being ignored as the read of thetimespec64
's 32-bittv_nsec
field would be reading the higher 32-bits of the 64-bit__kernel_timespec
tv_nsec
field, which is pretty much always going to be zero.This could potentially lead to code that, say, passes a timeout of 500ms to
io_uring_wait_cqe_timeout()
having this interpreted as if they passed a 0ms timeout and and always completing immediately - potentially leading to high CPU usage.I'm not sure how to address this inconsistency, though, as the code-path that passes an
io_uring_getevents_arg
structure toio_uring_enter2
still needs to pass the address of a__kernel_timespec
structure and so it is difficult to fix this API to be correct for both paths.Ideally both code-paths would be able to use the same time structure.
Would it be possible to have the submit/wait timeout functions "rewrite" the passed
__kernel_timespec
structure to have atimespec64
value and then "restore" the original__kernel_timespec
value before returning if theIORING_OP_TIMEOUT
code-path is taken and liburing detects that the layouts oftimespec64
and__kernel_timespec
differ?