axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.89k stars 407 forks source link

[BUG] Wrong error code when using io-uring connect opcode #1274

Closed ClawSeven closed 1 month ago

ClawSeven commented 1 month ago

Hi,

I've implemented an asynchronous network framework in Occlum using io_uring. When testing with the gVisor test suite, specifically the tcp_socket_test AllInetTests/SimpleTcpSocketTest.CleanupOnConnectionRefused/0, an unexpected error occurs:

test/syscalls/linux/tcp_socket.cc:1092: Failure Value of: connect(client_s.get(), reinterpret_cast<const struct sockaddr*>(&bound_addr), bound_addrlen) Expected: -1 (failure), with errno PosixError(errno=111 Connection refused) Actual: -1 (of type int), with errno PosixError(errno=103 Software caused connection abort)

Test Case Procedure:

  1. Create and Bind an Unlistenable Socket:
    • A TCP socket bound_s is created and bound to the loopback address (e.g., 127.0.0.1).
    • Without calling listen(), the socket should return RST for connection requests.
  2. Retrieve Binding Address Information:
    • getsockname() is used to fetch the address and port of bound_s, especially when the port is dynamically assigned (port 0).
  3. Create and Bind a Client Socket:
    • Another TCP socket client_s is created and bound to the loopback address.
  4. Test Connection to the Non-listening Port:
    • An attempt is made to connect client_s to bound_s.
    • Since bound_s isn't listening, the connection should be refused with an ECONNREFUSED error, similar to Linux behavior.
  5. Verify if the Port is Released:
    • A new socket new_s is created and successfully bound to the previous client_s address, confirming port release.
  6. Reverify Connection Refusal Behavior:
    • The connection attempt with client_s to bound_s is repeated to ensure it still results in ECONNREFUSED.

Issue Details:

In my implementation, I use libc for the socket and bind syscalls, while io_uring is employed for the connect syscall. Interestingly, the second connect attempt returns ECONNABORTED instead of ECONNREFUSED. However, when using the libc connect syscall, the second connect attempt returns ECONNREFUSED as expected. I've monitored the test case with the following Rust code, which demonstrates this unexpected behavior.


use io_uring::squeue::Entry;
use io_uring::types::Fd;
use io_uring::IoUring;
use io_uring::opcode;
use libc::ECONNABORTED;
use std::net::{TcpListener, TcpStream};
use std::os::unix::io::{AsRawFd, FromRawFd};

use libc::{
    bind, connect, getsockname, sockaddr, sockaddr_in, sockaddr_storage, socklen_t, socket,
    AF_INET, IPPROTO_TCP, SOCK_STREAM, ECONNREFUSED,
};
use std::mem;
use std::ptr;

fn syscall_succeeds(result: i32) -> bool {
    result == 0
}

fn syscall_fails_with_errno(result: i32, errno: i32) -> bool {
    result == -1 && unsafe { *libc::__errno_location() } == errno
}

fn inet_loopback_addr() -> sockaddr_in {
    let mut addr: sockaddr_in = unsafe { mem::zeroed() };
    addr.sin_family = AF_INET as u16;
    addr.sin_addr.s_addr = htonl(libc::INADDR_LOOPBACK);
    addr.sin_port = 0;
    addr
}

fn htonl(hostlong: u32) -> u32 {
    hostlong.to_be()
}

fn simple_tcp_socket_test_cleanup_on_connection_refused() {
    let mut ring = IoUring::new(256).unwrap();

    let bound_s = unsafe { socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) };
    assert!(bound_s >= 0);

    let mut bound_addr = inet_loopback_addr();
    let mut bound_addrlen = mem::size_of::<sockaddr_in>() as socklen_t;

    assert!(syscall_succeeds(unsafe {
        bind(
            bound_s,
            &mut bound_addr as *mut _ as *mut sockaddr,
            bound_addrlen,
        )
    }));

    assert!(syscall_succeeds(unsafe {
        getsockname(
            bound_s,
            &mut bound_addr as *mut _ as *mut sockaddr,
            &mut bound_addrlen,
        )
    }));

    let client_s = unsafe { socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) };
    assert!(client_s >= 0);

    let mut client_addr = inet_loopback_addr();
    let mut client_addrlen = mem::size_of::<sockaddr_in>() as socklen_t;

    assert!(syscall_succeeds(unsafe {
        bind(
            client_s,
            &mut client_addr as *mut _ as *mut sockaddr,
            client_addrlen,
        )
    }));

    assert!(syscall_succeeds(unsafe {
        getsockname(
            client_s,
            &mut client_addr as *mut _ as *mut sockaddr,
            &mut client_addrlen,
        )
    }));

    uring_connect_syscall(
        &mut ring, 
        client_s,
        &bound_addr as *const _ as *const sockaddr, 
        bound_addrlen
    );

    let new_s = unsafe { socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) };
    assert!(new_s >= 0);

    assert!(syscall_succeeds(unsafe {
        bind(
            new_s,
            &mut client_addr as *mut _ as *mut sockaddr,
            client_addrlen,
        )
    }));

    uring_connect_syscall(
        &mut ring, 
        client_s,
        &bound_addr as *const _ as *const sockaddr, 
        bound_addrlen
    );

    unsafe {
        libc::close(bound_s);
        libc::close(client_s);
        libc::close(new_s);
    }
}

fn uring_connect_syscall(ring: &mut IoUring, raw_fd: i32, addr: *const sockaddr, addr_len: socklen_t) {
    let fd = Fd(raw_fd);
    let entry = opcode::Connect::new(fd, addr, addr_len)
        .build();

    let res = unsafe {
        ring.submission().push(&entry);
        ring.submit_and_wait(1).unwrap()
    };

    let mut cq = ring.completion();

    cq.sync();
    if let Some(cqe) = cq.next() {
        let retval = cqe.result();
        println!("connect op entry: {:?}, retval: {:?}", &entry, retval);
    } else {
        panic!("no awailable cq entry");
    }
}

fn main() {
    simple_tcp_socket_test_cleanup_on_connection_refused();
}

The output is :

connect op entry: Entry { op_code: 16, flags: 0, user_data: 0 }, retval: -111
connect op entry: Entry { op_code: 16, flags: 0, user_data: 0 }, retval: -103

Any insights or recommendations would be appreciated.

axboe commented 1 month ago

You didn't mention what kernel you are running. That's key information that should always be in a bug report, in case it's an older kernel or the issue is dependent on the kernel version.

I wrote a quick c program to simulate the above scenario. Please take a look and see if it does what it's supposed to do. I get the following output:

axboe@m2max-kvm ~> ./conn-test
sync=1, 1st connect: 111
sync=1, 2nd connect: 111
sync=0, 1st connect: -111
sync=0, 2nd connect: -111

which seems to be consistent across the sync and async syscalls, and also matches your expectations. It's quite likely a messed something up, but please run it on your system and see what that says. Program below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <liburing.h>

static int create_socket(void)
{
    int fd;

    fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (fd == -1) {
        perror("socket()");
        return -1;
    }

    return fd;
}

static int configure_connect(int fd, struct sockaddr_in *addr)
{
    memset(addr, 0, sizeof(*addr));
    addr->sin_family = AF_INET;
    addr->sin_port = 0;
    return inet_aton("127.0.0.1", &addr->sin_addr);
}

static int try_connect(struct io_uring *ring, int sock, struct sockaddr_in *addr)
{
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;
    int ret;

    sqe = io_uring_get_sqe(ring);
    io_uring_prep_connect(sqe, sock, (struct sockaddr *) addr, sizeof(*addr));
    io_uring_submit(ring);

    ret = io_uring_wait_cqe(ring, &cqe);
    if (ret) {
        fprintf(stderr, "wait: %d\n", ret);
        return ret;
    }
    ret = cqe->res;
    io_uring_cqe_seen(ring, cqe);
    return ret;
}

static int test_no_listen(int sync_syscall)
{
    struct sockaddr_in saddr, laddr;
    socklen_t len = sizeof(saddr);
    int bound_s, client_s, new_s;
    struct io_uring ring;
    int ret;

    ret = io_uring_queue_init(4, &ring, IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN);
    if (ret) {
        fprintf(stderr, "queue_init: %d\n", ret);
        return 1;
    }

    bound_s = create_socket();
    if (bound_s < 0)
        return 1;

    if (getsockname(bound_s, (struct sockaddr *) &saddr, &len) < 0) {
        perror("getsockname");
        return 1;
    }

    if (bind(bound_s, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) {
        perror("bind");
        return 1;
    }

    client_s = create_socket();
    if (client_s < 0)
        return 1;

    if (configure_connect(client_s, &laddr) == -1)
        return 1;

    ret = 0;
    if (sync_syscall) {
        if (connect(client_s, (struct sockaddr *) &saddr, sizeof(saddr)) < 0)
            ret = errno;
    } else {
        ret = try_connect(&ring, client_s, &saddr);
    }
    printf("sync=%d, 1st connect: %d\n", sync_syscall, ret);

    new_s = create_socket();
    if (new_s < 0)
        return 1;

    if (bind(new_s, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) {
        perror("bind new_s");
        return 1;
    }

    ret = 0;
    if (sync_syscall) {
        if (connect(client_s, (struct sockaddr *) &saddr, sizeof(saddr)) < 0)
            ret = errno;
    } else {
        ret = try_connect(&ring, client_s, &saddr);
    }
    printf("sync=%d, 2nd connect: %d\n", sync_syscall, ret);
    return 0;
}

int main(int argc, char *argv[])
{
    test_no_listen(1);
    test_no_listen(0);
    return 0;
}
ClawSeven commented 1 month ago

Hi @axboe, thank you for responding so quickly. My kernel version is Linux d9b06c66aea1 6.2.1-060201-generic #202302251141 SMP PREEMPT_DYNAMIC Sat Feb 25 11:49:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux.

On this machine, I tested your program, and the output was:

sync=1, 1st connect: 111
sync=1, 2nd connect: 111
sync=0, 1st connect: -111
sync=0, 2nd connect: -103
axboe commented 1 month ago

6.2 has been dead since 2023, it's not a maintained stable kernel. I'd strongly recommend getting on one of the stable kernel versions (6.1,6.6,6.10,6.11), as fixes/corrections get backported from newer kernels to those. There's nothing I can do to fix a 6.2 kernel that hasn't seen a release since May 2023, unfortunately.

axboe commented 1 month ago

Pretty sure this is the 6.1-stable commit that fixed it:

commit 129debbb4178b4f316e812c87815a4d5880ebd6e
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Nov 3 10:35:40 2023 -0600

    io_uring/net: ensure socket is marked connected on connect retry

    commit f8f9ab2d98116e79d220f1d089df7464ad4e026d upstream.

but that is newer than the last 6.2-stable release, so can't ever get fixed for the 6.2 kernel. In any case, like mentioned, please run a newer kernel and it'll be fine.

ClawSeven commented 1 month ago

@axboe, thanks for your response. It cleared up my confusion. I'll upgrade my kernel version.