axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.77k stars 398 forks source link

SOCKET_URING_OP_GETSOCKOPT hangs #1181

Closed YoSTEALTH closed 1 month ago

YoSTEALTH commented 1 month ago

In some cases using SOCKET_URING_OP_GETSOCKOPT with io_uring_prep_cmd_sock will hang forever, not sure what's causing it. Temporary solution was to use io_uring_prep_link_timeout and add 0.001 second delay and it will work.

ammarfaizi2 commented 1 month ago

Do you have a reproducer?

YoSTEALTH commented 1 month ago

I do have source here: https://github.com/YoSTEALTH/Liburing/blob/master/src/liburing/socket.pyx#L453-L480 and test here https://github.com/YoSTEALTH/Liburing/blob/master/test/socket/cmd_sock_test.py which does pass but same function being called as:

# hangs forever!
async def getsockopt(int sockfd, int level, int optname)-> int:
    '''
        Example
            >>> await getsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR)
            1
    '''
    cdef:
        SQE sqe = SQE()
        array optval = array('i', [0])
    io_uring_prep_getsockopt(sqe, sockfd, level, optname, optval)
    await sqe
    return optval[0]

# works with `io_uring_prep_link_timeout`
async def getsockopt(int sockfd, int level, int optname)-> int:
    '''
        Example
            >>> await getsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR)
            1
    '''
    cdef:
        timespec ts = timespec(.001)
        SQE sqe = SQE(2, False)
        array optval = array('i', [0])
    io_uring_prep_getsockopt(sqe, sockfd, level, optname, optval)
    io_uring_prep_link_timeout(sqe[1], ts, 0)
    # note: timeout is needed or else `getsockopt` will hang forever!
    await sqe
    return optval[0]
axboe commented 1 month ago

How do we run it?

YoSTEALTH commented 1 month ago

You could just install the python liburing or shakti version and run the test? I created a new test https://github.com/YoSTEALTH/Shakti/blob/master/test/io/socket_test.py#L59-L68 not sure if you can run that! Github doesn't run it since its linux must be older than 6.7

YoSTEALTH commented 1 month ago

Running with your trace enabled.

async def set_get_sockname():
    assert (socket_fd := await shakti.socket()) > 0
    try:
        await shakti.setsockopt(socket_fd, shakti.SOL_SOCKET, shakti.SO_REUSEADDR, 1)
        assert await shakti.getsockopt(socket_fd, shakti.SOL_SOCKET, shakti.SO_REUSEADDR) == 1
    finally:
        await shakti.close(socket_fd)

Code using io_uring_prep_link_timeout gives:

# tracer: nop
#
# entries-in-buffer/entries-written: 30/30   #P:24
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         python3-908310  [019] ..... 67993.530069: io_uring_create: ring 00000000d63a2518, fd 5 sq size 1024, cq size 2048, flags 0x10000
         python3-908310  [019] ..... 67993.530113: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, opcode NOP, flags 0x10, sq_thread 0
         python3-908310  [019] ..... 67993.530115: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd374fa0, opcode NOP, flags 0x40010, normal queue, work 00000000f3c49db8
         python3-908310  [019] ..... 67993.530156: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
  iou-wrk-908310-908312  [000] ...1. 67993.530226: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, result 0, cflags 0x0 extra1 0 extra2 0
         python3-908310  [019] ..... 67993.530353: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
         python3-908310  [019] ..... 67993.530361: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, opcode SOCKET, flags 0x10, sq_thread 0
         python3-908310  [019] ..... 67993.530361: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd376620, opcode SOCKET, flags 0x40010, normal queue, work 00000000f3c49db8
         python3-908310  [019] ..... 67993.530364: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
  iou-wrk-908310-908312  [000] ...1. 67993.530402: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, result 6, cflags 0x0 extra1 0 extra2 0
         python3-908310  [019] ..... 67993.530410: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
         python3-908310  [019] ..... 67993.530439: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, opcode URING_CMD, flags 0x10, sq_thread 0
         python3-908310  [019] ..... 67993.530440: io_uring_file_get: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, fd 6
         python3-908310  [019] ..... 67993.530441: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd374fa0, opcode URING_CMD, flags 0x20240010, normal queue, work 00000000f3c49db8
         python3-908310  [019] ..... 67993.530443: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
  iou-wrk-908310-908312  [000] ...1. 67993.530510: io_uring_complete: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd374fa0, result 0, cflags 0x0 extra1 0 extra2 0
         python3-908310  [019] ..... 67993.530519: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
         python3-908310  [019] ..... 67993.530540: io_uring_submit_req: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, opcode URING_CMD, flags 0x18, sq_thread 0
         python3-908310  [019] ..... 67993.530541: io_uring_submit_req: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376380, opcode LINK_TIMEOUT, flags 0x200010, sq_thread 0
         python3-908310  [019] ..... 67993.530541: io_uring_link: ring 00000000d63a2518, request 0000000088cd795f linked after 000000001de33f95
         python3-908310  [019] ..... 67993.530542: io_uring_file_get: ring 00000000d63a2518, req 000000001de33f95, user_data 0x7074bd376620, fd 6
         python3-908310  [019] ..... 67993.530542: io_uring_queue_async_work: ring 00000000d63a2518, request 000000001de33f95, user_data 0x7074bd376620, opcode URING_CMD, flags 0x202c1018, normal queue, work 00000000f3c49db8
         python3-908310  [019] ..... 67993.530546: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
         python3-908310  [019] ...1. 67993.531592: io_uring_complete: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376380, result -2, cflags 0x0 extra1 0 extra2 0
         python3-908310  [019] ..... 67993.531594: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1
         python3-908310  [019] ..... 67993.531727: io_uring_submit_req: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376620, opcode CLOSE, flags 0x10, sq_thread 0
         python3-908310  [019] ..... 67993.531728: io_uring_queue_async_work: ring 00000000d63a2518, request 0000000088cd795f, user_data 0x7074bd376620, opcode CLOSE, flags 0x40010, normal queue, work 00000000ebc0e017
         python3-908310  [019] ..... 67993.531731: io_uring_cqring_wait: ring 00000000d63a2518, min_events 1
  iou-wrk-908310-908312  [000] ...1. 67993.531801: io_uring_complete: ring 00000000d63a2518, req 0000000088cd795f, user_data 0x7074bd376620, result 0, cflags 0x0 extra1 0 extra2 0
         python3-908310  [019] ..... 67993.531810: io_uring_task_work_run: tctx 00000000711ba1fa, count 1, loops 1

Code without:

# tracer: nop
#
# entries-in-buffer/entries-written: 21/21   #P:24
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
         python3-912947  [000] ..... 68288.549749: io_uring_create: ring 0000000028f38b0a, fd 5 sq size 1024, cq size 2048, flags 0x10000
         python3-912947  [000] ..... 68288.549793: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode NOP, flags 0x10, sq_thread 0
         python3-912947  [000] ..... 68288.549795: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode NOP, flags 0x40010, normal queue, work 000000003392933e
         python3-912947  [000] ..... 68288.549825: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
  iou-wrk-912947-912956  [002] ...1. 68288.549862: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, result 0, cflags 0x0 extra1 0 extra2 0
         python3-912947  [000] ..... 68288.549881: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
         python3-912947  [000] ..... 68288.550059: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, opcode SOCKET, flags 0x10, sq_thread 0
         python3-912947  [000] ..... 68288.550060: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc22800, opcode SOCKET, flags 0x40010, normal queue, work 000000003392933e
         python3-912947  [000] ..... 68288.550063: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
  iou-wrk-912947-912956  [002] ...1. 68288.550107: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, result 6, cflags 0x0 extra1 0 extra2 0
         python3-912947  [000] ..... 68288.550113: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
         python3-912947  [000] ..... 68288.550141: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode URING_CMD, flags 0x10, sq_thread 0
         python3-912947  [000] ..... 68288.550142: io_uring_file_get: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, fd 6
         python3-912947  [000] ..... 68288.550143: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc20fa0, opcode URING_CMD, flags 0x20240010, normal queue, work 000000003392933e
         python3-912947  [000] ..... 68288.550145: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1
  iou-wrk-912947-912956  [002] ...1. 68288.550154: io_uring_complete: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc20fa0, result 0, cflags 0x0 extra1 0 extra2 0
         python3-912947  [000] ..... 68288.550159: io_uring_task_work_run: tctx 00000000f45a9c58, count 1, loops 1
         python3-912947  [000] ..... 68288.550176: io_uring_submit_req: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, opcode URING_CMD, flags 0x10, sq_thread 0
         python3-912947  [000] ..... 68288.550176: io_uring_file_get: ring 0000000028f38b0a, req 00000000e65ca6a4, user_data 0x7155efc22800, fd 6
         python3-912947  [000] ..... 68288.550177: io_uring_queue_async_work: ring 0000000028f38b0a, request 00000000e65ca6a4, user_data 0x7155efc22800, opcode URING_CMD, flags 0x20240010, normal queue, work 000000003392933e
         python3-912947  [000] ..... 68288.550179: io_uring_cqring_wait: ring 0000000028f38b0a, min_events 1

hangs edit: updated

axboe commented 1 month ago

I'm out until next week, but will be happy to take a look when I'm back. I'm going to need explicit instructions on exactly what to do and run to avoid wasting a bunch of time I don't have on this. In general, to have to best chance of having someone take a look at a potential issue, serve up a reproducer on a silver platter. That isn't specific to this project.

isilence commented 1 month ago

https://gist.github.com/isilence/3790206cccb8acd5528fcd10994939ea

C reproducer, gets stuck on SOCKET_URING_OP_GETSOCKOPT, but only if there is IOSQE_ASYNC. Need to debug what's happening.

A side point, looking at traces seems all requests are IOSQE_ASYNC, which is extremely bad for performance.

isilence commented 1 month ago

This should help. Need to test it with nvme passthrough / ublk.

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 21ac5fb2d5f0..8391c7c7c1ec 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -265,7 +265,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
        req_set_fail(req);
    io_req_uring_cleanup(req, issue_flags);
    io_req_set_res(req, ret, 0);
-   return ret;
+   return IOU_OK;
 }

 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
YoSTEALTH commented 1 month ago

Thanks @isilence looks like you caught the bug! For months now I been dealing with weird issues and you are right I had set sqe flags to use IOSQE_ASYNC by default, thinking it was better for performance! wouldn't have guessed it was the culprit.

When exactly is it a good idea to use IOSQE_ASYNC?

How exactly do I try out the patch you created?

isilence commented 1 month ago

Thanks @isilence looks like you caught the bug! For months now I been dealing with weird issues and you are right I had set sqe flags to use IOSQE_ASYNC by default, thinking it was better for performance! wouldn't have guessed it was the culprit.

When exactly is it a good idea to use IOSQE_ASYNC?

It's not a "make it better flag" at all, and in general if there are no side effects or user behaviour changes it'd be not a flag but just internally done optimisation. I'd say it's useful when you're CPU bound or have some latency requirement and know that some requests will take a lot of CPU cycles, e.g. a receive of MB of data in which case memcpy would be quite expensive, but the exact threshold is up for profiling.

How exactly do I try out the patch you created?

Since I have a C repro, I'm pretty sure it solves the problem, so I'll send a patch a bit later. You either can apply the patch to a kernel tree or I'll give you a branch, then you'd need to compile the kernel and install it. [1] is a useful manual, but you can find online few others.

[1] https://wiki.archlinux.org/title/Kernel/Traditional_compilation

isilence commented 1 month ago

@YoSTEALTH, in case you want me to add "reported-by" to the commit, can you share your full name and email?

YoSTEALTH commented 1 month ago

@isilence its not important for me

isilence commented 1 month ago

It's in, once the kernel is release it'll get backported. Closing the issue.