DouglasGray / xsk-rs

A Rust interface for Linux AF_XDP sockets
MIT License
68 stars 16 forks source link

Could use libxdp-sys replace libbpf-sys ? #21

Closed hi-glenn closed 5 months ago

hi-glenn commented 1 year ago

https://crates.io/crates/libxdp-sys/versions v0.2.0 is corresponding to libxdp v1.4.0

DouglasGray commented 1 year ago

hello! thanks for the heads up. I've been keeping an eye on that crate but didn't notice it had been updated. will take a look when I next get some time

DouglasGray commented 11 months ago

got a PR with the changes here, however tests are failing. need to figure that out (and fix why, despite that, CI is green), I should hopefully get some time next weekend to dig into it

ChrisPortman commented 9 months ago

got a PR with the changes here, however tests are failing. need to figure that out (and fix why, despite that, CI is green), I should hopefully get some time next weekend to dig into it

I found that shared_umem_does_not_return_new_fq_and_cq_when_sockets_are_bound_to_same_device was segfaulting, I think because the Drop of the underlying socket stuff shared between the 2 Socket::new(..) is trying to double clean stuff up. As a result, the veths dont get deleted properly causing the setup of subsequent tests to fail. If I disable that test, I get a whole bunch of legit fails which I cant get my head around.

I note that I get the same results on master though with that test disabled.

ChrisPortman commented 9 months ago

I'm focusing on trying to get this test to pass: frame_consumed_with_consume_one_should_match_addr_of_one_produced, trying to consume from the completion queue after the produce_and_wake_up is always returning 0. The producer idx value of the completion queue remains 0.

ChrisPortman commented 9 months ago

got a PR with the changes here, however tests are failing. need to figure that out (and fix why, despite that, CI is green), I should hopefully get some time next weekend to dig into it

@DouglasGray I've created a PR that merges into your PR with some updates that resolves all but one of the tests - which I've commented out with a comment

DouglasGray commented 9 months ago

awesome, thanks for that! greatly appreciated. I will take a look this week

DouglasGray commented 8 months ago

sorry, busy week. I shall get round to it though, appreciate the patience

DouglasGray commented 8 months ago

@ChrisPortman I merged your PR and updated the examples to use a similarly structured ethernet packet, thanks a lot for that.

trying to figure out the best way to fix that failing test due to the double-free

ZENOTME commented 7 months ago

I try to figure out the reason for double-free. I think it was caused by creating xdp program for two sockets. If we want to share umem between the device with the same queue, we should use XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD and load the xdp program manually.

There is an example of sharing the same umem between the same device with the same queue. We can find that it will load the xdp program manually and set the XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD for each socket. https://github.com/xdp-project/bpf-examples/blob/5343ed3377471c7b7ef2237526c8bdc0f00a0cef/AF_XDP-example/xdpsock.c#L1256

I try to figure it out internally: Looks like the socket with the same device and same queue will use the same ctx. And xdp program load by libxdp will store it in the ctx and this may cause the problem. https://github.com/xdp-project/xdp-tools/blob/cae9c91353cd3ed51753168203ed101905b9ac9e/lib/libxdp/xsk.c#L797

How to solve it

We need the new API to support the following things:

  1. support to load the xdp program and manage them
  2. attach the socket to the xdp program load before After supporting them, we can support to create multiple sockets based on the same device with the same queue.

But I think it's worth creating a new issue to discuss how to implement this. (Actually, I will work on a draft for it. It's glad to migrate it if it looks good

And we can prevent users from doing this first to fix this issue quickly. Set LibxdpFlags::XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD can pass the test.

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
#[serial]
async fn shared_umem_does_not_return_new_fq_and_cq_when_sockets_are_bound_to_same_device() {
    let inner = move |dev1_config: VethDevConfig, _dev2_config: VethDevConfig| {
        let (umem, _frames) =
            Umem::new(UmemConfig::default(), 64.try_into().unwrap(), false).unwrap();

        let (_sender_tx_q, _sender_rx_q, sender_fq_and_cq) = Socket::new(
            SocketConfig::builder().libbpf_flags(LibxdpFlags::XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD).build(),
            &umem,
            &dev1_config.if_name().parse().unwrap(),
            0,
        )
        .unwrap();

        assert!(sender_fq_and_cq.is_some());

        let (_receiver_tx_q, _receiver_rx_q, receiver_fq_and_cq) = Socket::new(
            SocketConfig::builder().libbpf_flags(LibxdpFlags::XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD).build(),
            &umem,
            &dev1_config.if_name().parse().unwrap(),
            0,
        )
        .unwrap();

        assert!(receiver_fq_and_cq.is_none());
    };

    let (dev1_config, dev2_config) = setup::default_veth_dev_configs();

    veth_setup::run_with_veth_pair(inner, dev1_config, dev2_config)
        .await
        .unwrap();
}

cc @DouglasGray

DouglasGray commented 6 months ago

thanks for digging into this, I appreciate the assistance. I'm away on holiday atm but shall take a look when I'm back next week!

DouglasGray commented 6 months ago

the libxdp-sys changes have been merged. in the absence of a good solution to the double free at this time I've marked Socket::new as unsafe in order to make it clear that binding to an already bound interface + queue pair with a shared UMEM requires the XSK_LIBXDP_FLAGS_INHIBIT_PROG_LOAD flag to be set

DouglasGray commented 5 months ago

closing this as v0.6.1 has been published, thanks for the help all