SkBuffContext and IPv6 - Githubissues

jornfranke commented 1 year ago

Hi,

I am writing a socket filter (will be open sourced once working). I use this as a basis: https://github.com/aya-rs/book/tree/main/examples/cgroup-skb-egress, but it is not for cgroups_skb, but for socket filter.

I have made it working for IPv4 and IPv6. However, in the eBPF program itself I am only able to get properly the IPv4 address. In case of IPv6 sockets, I somehow get no proper IP address.

I extract the IPv4 address similarly to here: https://github.com/aya-rs/book/blob/main/examples/cgroup-skb-egress/cgroup-skb-egress-ebpf/src/main.rs#L46

This is my code to extract the IPv6 address:

 u128::from_be(ctx.load(offset_of!(ipv6hdr, saddr)).unwrap())

It is obvious that this is not an IP address as this is changing all the time. This is ipv6hdr:

pub struct ipv6hdr {
    pub _bitfield_align_1: [u8; 0],
    pub _bitfield_1: __BindgenBitfieldUnit<[u8; 1usize]>,
    pub flow_lbl: [__u8; 3usize],
    pub payload_len: __be16,
    pub nexthdr: __u8,
    pub hop_limit: __u8,
    pub saddr: [__be32; 4usize],
    pub daddr: [__be32; 4usize],
}

I am not sure if case of IPv6 the ipv6hdr is at a different offset and I need to add sth.

Any idea?

Thanks a lot.

FallingSnow commented 1 year ago

Are you taking into account the ethernet header offset?

ETH_HDR_LEN + offset_of!(iphdr, saddr)

jornfranke commented 1 year ago

Thanks for the quick answer.

For IPv4 for sock_filter (not TC) it is not needed and it works there perfectly without ETH_HDR. For IPv6 with or without it makes no difference - Maybe I do not correctly understand what is in SkBuffContext when an AF_INET6 socket is used...

jornfranke commented 1 year ago

Maybe some more context. This is my ebpf program

[..]

#[socket_filter(name = "sock_egress")]
pub fn sock_egress(ctx: SkBuffContext) -> i64 {
  match try_sock_egress(ctx) {
        Ok(ret) => ret,
        Err(_) => 0,
    }

}
[..]
n try_sock_egress(ctx: SkBuffContext) -> Result<i64, i64> {

    // determine protocol
    // only process ipv4 and ipv6 packet
        // determine protocol
        let h_proto = unsafe { (*ctx.skb.skb).protocol };

        // only process ipv4 and ipv6 packages
        let ip_version: u32 = match h_proto {
            ETH_P_IP => 4,
            ETH_P_IPV6 => 6,
            _ => return Ok(0), // drop packet
        }; 
    // determine destination of the packet
    let destination: u128 = 0;
    let destination: u128 = match ip_version {
        4 => 
        u32::from_be(ctx.load(offset_of!(iphdr, saddr)).unwrap()) as u128,
        6 => { 

            u128::from_be(ctx.load(offset_of!(ipv6hdr, saddr)).unwrap())

        },
        _ => 0,
    };
 [..]

As said - the IPv4 part works correctly (correct IP etc.), the IPv6 part - I think I am missing something.

jornfranke commented 1 year ago

Ahh I see, if the protocol is IPv6 then I do not get the IP header in the data of skbuff, but only the TCP header... So this part is consistent with the raw socket API for IPv6. Any idea on how to get somehow the IP address then in the eBPF program?

jornfranke commented 1 year ago

Or if this is feasible at all? I just want to explore what is possible with a socket filter, I am aware that I can also use XDP or TC. My use case is => a user space program has a raw socket to inspect all IP packets. In order to increase performance I want to prefilter (not drop!) packets based on basic information, such as IP address. For instance, the user space program should only look at the packets with "suspicious" IP addresses and the rest should not even reach the user space program...

FallingSnow commented 1 year ago

Ahh I see, if the protocol is IPv6 then I do not get the IP header in the data of skbuff, but only the TCP header... So this part is consistent with the raw socket API for IPv6.

Oh, I've never used skb before. Had no idea it was any different.

Sorry if I'm not following. Are you saying https://docs.aya-rs.dev/bpf/aya_bpf/bindings/struct.__sk_buff.html has protocol but the ip address fields aren't filled out?

jornfranke commented 1 year ago

Well the ip address fields are only filled out for BPF type: BPF_PROG_TYPE_SK_SKB (see: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types) - not for BPF_PROG_TYPE_SOCKET_FILTER

It seems - the only way to access ancialliary data (ie the ipv6 header) is to load it from a negative offset: https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/include/uapi/linux/filter.h#L60

It seems that aya only allows unsigned (positive) offsets. Can anyone confirm this assumption?

FallingSnow commented 1 year ago

Hmm, interesting. It seems like aya will only load with an offset usize so only positive.

alessandrod commented 1 year ago

Ahh I see, if the protocol is IPv6 then I do not get the IP header in the data of skbuff, but only the TCP header

What makes you think this? Here's an example of a socket filter parsing IP headers https://github.com/torvalds/linux/blob/03421a92f5627430d23ed95df55958e04848f184/samples/bpf/sockex2_kern.c#L100

Well the ip address fields are only filled out for BPF type: BPF_PROG_TYPE_SK_SKB (see: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types) - not for BPF_PROG_TYPE_SOCKET_FILTER

I haven't written a socket filter in a long time but from what I can see in the kernel source, this doesn't seem to be true either?

jornfranke commented 1 year ago

thanks a lot.

Ahh I see, if the protocol is IPv6 then I do not get the IP header in the data of skbuff, but only the TCP header

What makes you think this? Here's an example of a socket filter parsing IP headers https://github.com/torvalds/linux/blob/03421a92f5627430d23ed95df55958e04848f184/samples/bpf/sockex2_kern.c#L100

Because the data starts with the TCP header - I can parse port etc. successfully. It could be the different ways on how the raw socket is opened. I use:
let fd: i32 = unsafe { libc::socket(libc::AF_INET6, libc::SOCK_RAW, libc::IPPROTO_TCP) };
In this way - even without eBPF - I receive in the user space program only the TCP header, which is normal (cf. e.g.https://schoenitzer.de/blog/2018/Linux%20Raw%20Sockets.html).

Will try with

let fd: i32 = unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_DGRAM, libc::ETHERTYPE_IPV6) };

Well the ip address fields are only filled out for BPF type: BPF_PROG_TYPE_SK_SKB (see: https://blogs.oracle.com/linux/post/bpf-a-tour-of-program-types) - not for BPF_PROG_TYPE_SOCKET_FILTER

I haven't written a socket filter in a long time but from what I can see in the kernel source, this doesn't seem to be true either?

Well I just quoted the blog - this can have changed in different kernel versions, but contrary to see BPF functions to see which one are allowed and which one are not, finding out which fields are filled out in the view __sk_buff by the kernel is not so obvious (or I look in the wrong place). Where do you see this in the kernel source? Just for clarification - some of the fields in the view __sk_buff are filled, but not all - especially not the ip address ones if i use sock_filter.

It seems according to the tests the blog is correct: https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/verifier/ctx_skb.c

Nevertheless, I could also overlook sth. here.

Maybe the confusion comes from the different types of raw sockets used (I used an AF_INET6 one, the example you reference seems to be at a lower level, possibly AF_PACKET).

jornfranke commented 1 year ago

Somehow, I have issues with Rust and a raw socket with AF_PACKET. For example, this quick and dirty C program works

#include<errno.h>
#include<stdio.h>   
#include<stdlib.h>  
#include<netinet/if_ether.h>    
#include<sys/socket.h>

int main()
{

    unsigned char *buffer = (unsigned char *) malloc(65536); //Its Big!

    int data_size;
    int sock_raw = socket( AF_PACKET , SOCK_RAW , htons(ETH_P_ALL)) ;

    if(sock_raw < 0)
    {
        //Print the error with proper message
        perror("Socket Error");
        return 1;
    }
    while(1)
    {   
        printf("Receiving");
        //Receive a packet
        data_size = recv(sock_raw , buffer , 65536 , 0);
        if(data_size <0 )
        {
            printf("Recv error , failed to get packets\n");
            return 1;
        }
        printf("%d",data_size);
    }
    pclose(sock_raw);
    printf("Finished");
    return 0;
}

It shows packages are received over the raw socket with AF_PACKET.

However the equivalent quick and dirty Rust program does not show anything. It shows just "Enter loop" and then it is blocked in recv.

fn main() {

        // create raw socket
        let fd: i32 = unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_RAW, libc::ETH_P_ALL) };
        if fd < 0 {
            println!("Error socket");
            return;
        }

        let mut buffer = vec![0u8; 4096].into_boxed_slice();
        while (true) {
            println!("Enter loop");

       let result=unsafe{libc::recv(fd, buffer.as_mut_ptr() as *mut libc::c_void, buffer.len(),0)};
        if result < 0 {

            println!("Error read");
        } else {
            println!("Size: {}",result);
        }
    }
}

It works though with AF_INET,AF_INET6

jornfranke commented 1 year ago

Ok, found the issue in the Rust program (had to simulate htons(ETH_P_ALL)

n main() {

        // create raw socket
          let fd: i32 = unsafe { libc::socket(libc::AF_PACKET, libc::SOCK_RAW, (libc::ETH_P_ALL as u16).to_be() as i32) };
        if fd < 0 {
            println!("Error socket");
            return;
        }

        let mut buffer = vec![0u8; 4096].into_boxed_slice();
        while (true) {
            println!("Enter loop");

       let result=unsafe{libc::recv(fd, buffer.as_mut_ptr() as *mut libc::c_void, buffer.len(),0)};
        if result < 0 {

            println!("Error read");
        } else {
            println!("Size: {}",result);
        }
    }
}

Will for now with the raw packets, I just wonder if the following makes in aya sense (or if I misunderstood how it works)?

It seems - the only way to access ancialliary data (ie the ipv6 header) is to load it from a negative offset: https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/include/uapi/linux/filter.h#L60

If it makes sense then I can close this issue and create a new issue for aya on this, if not then I simply close this issue.

Please let me know.

alessandrod commented 1 year ago

It seems - the only way to access ancialliary data (ie the ipv6 header) is to load it from a negative offset: https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/include/uapi/linux/filter.h#L60

If it makes sense then I can close this issue and create a new issue for aya on this, if not then I simply close this issue.

https://github.com/torvalds/linux/blob/150aae354b817f540848476bace2b2ba9931b197/net/core/filter.c#L340

It seems to me that that stuff is only for (classic) BPF and is mapped to just accessing skb->$field in eBPF?

jornfranke commented 1 year ago

Good question, I will investigate. I can confirm so that with an aya socket filter I did not had access to remote_ip etc. only to protocol (essentially that were allowed by the tests shown in the Linux kernel). I do not think this was related to aya.

aya-rs / book

SkBuffContext and IPv6 #82