Closed YoSTEALTH closed 1 month ago
Do you have a reproducer?
I do have source here: https://github.com/YoSTEALTH/Liburing/blob/master/src/liburing/socket.pyx#L453-L480 and test here https://github.com/YoSTEALTH/Liburing/blob/master/test/socket/cmd_sock_test.py which does pass but same function being called as:
# hangs forever!
async def getsockopt(int sockfd, int level, int optname)-> int:
'''
Example
>>> await getsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR)
1
'''
cdef:
SQE sqe = SQE()
array optval = array('i', [0])
io_uring_prep_getsockopt(sqe, sockfd, level, optname, optval)
await sqe
return optval[0]
# works with `io_uring_prep_link_timeout`
async def getsockopt(int sockfd, int level, int optname)-> int:
'''
Example
>>> await getsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR)
1
'''
cdef:
timespec ts = timespec(.001)
SQE sqe = SQE(2, False)
array optval = array('i', [0])
io_uring_prep_getsockopt(sqe, sockfd, level, optname, optval)
io_uring_prep_link_timeout(sqe[1], ts, 0)
# note: timeout is needed or else `getsockopt` will hang forever!
await sqe
return optval[0]
How do we run it?
You could just install the python liburing or shakti version and run the test? I created a new test https://github.com/YoSTEALTH/Shakti/blob/master/test/io/socket_test.py#L59-L68 not sure if you can run that! Github doesn't run it since its linux must be older than 6.7
Running with your trace enabled.
async def set_get_sockname():
assert (socket_fd := await shakti.socket()) > 0
try:
await shakti.setsockopt(socket_fd, shakti.SOL_SOCKET, shakti.SO_REUSEADDR, 1)
assert await shakti.getsockopt(socket_fd, shakti.SOL_SOCKET, shakti.SO_REUSEADDR) == 1
finally:
await shakti.close(socket_fd)
Code using io_uring_prep_link_timeout
gives:
# tracer: nop
#
# entries-in-buffer/entries-written: 30/30 #P:24
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
python3-908310 [019] ..... 67993.530069: io_uring_create: ring 00000000d63a2518, fd 5 sq size 1024, cq size 2048, flags 0x10000
python3-908310 [019] ..... 67993.530113: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, opcode NOP, flags 0x10, sq_thread 0
python3-908310 [019] ..... 67993.530115: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd374fa0, opcode NOP, flags 0x40010, normal queue, work 00000000f3c49db8
python3-908310 [019] ..... 67993.530156: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
iou-wrk-908310-908312 [000] ...1. 67993.530226: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, result 0, cflags 0x0 extra1 0 extra2 0
python3-908310 [019] ..... 67993.530353: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
python3-908310 [019] ..... 67993.530361: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, opcode SOCKET, flags 0x10, sq_thread 0
python3-908310 [019] ..... 67993.530361: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd376620, opcode SOCKET, flags 0x40010, normal queue, work 00000000f3c49db8
python3-908310 [019] ..... 67993.530364: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
iou-wrk-908310-908312 [000] ...1. 67993.530402: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, result 6, cflags 0x0 extra1 0 extra2 0
python3-908310 [019] ..... 67993.530410: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
python3-908310 [019] ..... 67993.530439: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, opcode URING_CMD, flags 0x10, sq_thread 0
python3-908310 [019] ..... 67993.530440: io_uring_file_get: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, fd 6
python3-908310 [019] ..... 67993.530441: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd374fa0, opcode URING_CMD, flags 0x20240010, normal queue, work 00000000f3c49db8
python3-908310 [019] ..... 67993.530443: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
iou-wrk-908310-908312 [000] ...1. 67993.530510: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, result 0, cflags 0x0 extra1 0 extra2 0
python3-908310 [019] ..... 67993.530519: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
python3-908310 [019] ..... 67993.530540: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, opcode URING_CMD, flags 0x18, sq_thread 0
python3-908310 [019] ..... 67993.530541: io_uring_submit_req: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376380, opcode LINK_TIMEOUT, flags 0x200010, sq_thread 0
python3-908310 [019] ..... 67993.530541: io_uring_link: ring 00000000d63a2518, request 0000000088cd795f linked after 000000001de33f95
python3-908310 [019] ..... 67993.530542: io_uring_file_get: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, fd 6
python3-908310 [019] ..... 67993.530542: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd376620, opcode URING_CMD, flags 0x202c1018, normal queue, work 00000000f3c49db8
python3-908310 [019] ..... 67993.530546: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
python3-908310 [019] ...1. 67993.531592: io_uring_complete: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376380, result -2, cflags 0x0 extra1 0 extra2 0
python3-908310 [019] ..... 67993.531594: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
python3-908310 [019] ..... 67993.531727: io_uring_submit_req: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376620, opcode CLOSE, flags 0x10, sq_thread 0
python3-908310 [019] ..... 67993.531728: io_uring_queue_async_work: ring 00000000d63a2518, request 0000000088cd795f, user_data 0x7074bd376620, opcode CLOSE, flags 0x40010, normal queue, work 00000000ebc0e017
python3-908310 [019] ..... 67993.531731: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
iou-wrk-908310-908312 [000] ...1. 67993.531801: io_uring_complete: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376620, result 0, cflags 0x0 extra1 0 extra2 0
python3-908310 [019] ..... 67993.531810: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
Code without:
# tracer: nop
#
# entries-in-buffer/entries-written: 21/21 #P:24
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
python3-912947 [000] ..... 68288.549749: io_uring_create: ring 0000000028f38b0a, fd 5 sq size 1024, cq size 2048, flags 0x10000
python3-912947 [000] ..... 68288.549793: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode NOP, flags 0x10, sq_thread 0
python3-912947 [000] ..... 68288.549795: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode NOP, flags 0x40010, normal queue, work 000000003392933e
python3-912947 [000] ..... 68288.549825: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
iou-wrk-912947-912956 [002] ...1. 68288.549862: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, result 0, cflags 0x0 extra1 0 extra2 0
python3-912947 [000] ..... 68288.549881: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
python3-912947 [000] ..... 68288.550059: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, opcode SOCKET, flags 0x10, sq_thread 0
python3-912947 [000] ..... 68288.550060: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc22800, opcode SOCKET, flags 0x40010, normal queue, work 000000003392933e
python3-912947 [000] ..... 68288.550063: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
iou-wrk-912947-912956 [002] ...1. 68288.550107: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, result 6, cflags 0x0 extra1 0 extra2 0
python3-912947 [000] ..... 68288.550113: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
python3-912947 [000] ..... 68288.550141: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode URING_CMD, flags 0x10, sq_thread 0
python3-912947 [000] ..... 68288.550142: io_uring_file_get: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, fd 6
python3-912947 [000] ..... 68288.550143: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode URING_CMD, flags 0x20240010, normal queue, work 000000003392933e
python3-912947 [000] ..... 68288.550145: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
iou-wrk-912947-912956 [002] ...1. 68288.550154: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, result 0, cflags 0x0 extra1 0 extra2 0
python3-912947 [000] ..... 68288.550159: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
python3-912947 [000] ..... 68288.550176: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, opcode URING_CMD, flags 0x10, sq_thread 0
python3-912947 [000] ..... 68288.550176: io_uring_file_get: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, fd 6
python3-912947 [000] ..... 68288.550177: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc22800, opcode URING_CMD, flags 0x20240010, normal queue, work 000000003392933e
python3-912947 [000] ..... 68288.550179: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
hangs edit: updated
I'm out until next week, but will be happy to take a look when I'm back. I'm going to need explicit instructions on exactly what to do and run to avoid wasting a bunch of time I don't have on this. In general, to have to best chance of having someone take a look at a potential issue, serve up a reproducer on a silver platter. That isn't specific to this project.
https://gist.github.com/isilence/3790206cccb8acd5528fcd10994939ea
C reproducer, gets stuck on SOCKET_URING_OP_GETSOCKOPT
, but only if there is IOSQE_ASYNC
. Need to debug what's happening.
A side point, looking at traces seems all requests are IOSQE_ASYNC
, which is extremely bad for performance.
This should help. Need to test it with nvme passthrough / ublk.
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 21ac5fb2d5f0..8391c7c7c1ec 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -265,7 +265,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
req_set_fail(req);
io_req_uring_cleanup(req, issue_flags);
io_req_set_res(req, ret, 0);
- return ret;
+ return IOU_OK;
}
int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
Thanks @isilence looks like you caught the bug! For months now I been dealing with weird issues and you are right I had set sqe
flags
to use IOSQE_ASYNC
by default, thinking it was better for performance! wouldn't have guessed it was the culprit.
When exactly is it a good idea to use IOSQE_ASYNC
?
How exactly do I try out the patch you created?
Thanks @isilence looks like you caught the bug! For months now I been dealing with weird issues and you are right I had set
sqe
flags
to useIOSQE_ASYNC
by default, thinking it was better for performance! wouldn't have guessed it was the culprit.When exactly is it a good idea to use
IOSQE_ASYNC
?
It's not a "make it better flag" at all, and in general if there are no side effects or user behaviour changes it'd be not a flag but just internally done optimisation. I'd say it's useful when you're CPU bound or have some latency requirement and know that some requests will take a lot of CPU cycles, e.g. a receive of MB of data in which case memcpy would be quite expensive, but the exact threshold is up for profiling.
How exactly do I try out the patch you created?
Since I have a C repro, I'm pretty sure it solves the problem, so I'll send a patch a bit later. You either can apply the patch to a kernel tree or I'll give you a branch, then you'd need to compile the kernel and install it. [1] is a useful manual, but you can find online few others.
[1] https://wiki.archlinux.org/title/Kernel/Traditional_compilation
@YoSTEALTH, in case you want me to add "reported-by" to the commit, can you share your full name and email?
@isilence its not important for me
It's in, once the kernel is release it'll get backported. Closing the issue.
In some cases using
SOCKET_URING_OP_GETSOCKOPT
withio_uring_prep_cmd_sock
will hang forever, not sure what's causing it. Temporary solution was to useio_uring_prep_link_timeout
and add0.001
second delay and it will work.