foniod / redbpf

Rust library for building and running BPF/eBPF modules
Apache License 2.0
1.71k stars 136 forks source link

Probing program appears as if sending packets #254

Open undvikar opened 2 years ago

undvikar commented 2 years ago

Hello,

I have a question related to my previous issue #220. I wrote a program with the purpose of matching network packets to PIDs. For this, I use a kprobe redBPF program to retrieve packet information from the kernel function __dev_queue_xmit. This information is then forwarded to the userspace program where the matching takes place. However, I noticed that quite a few packets are matched to the PID of the matching program itself, which I thought was odd because the matching program does not send any packets itself, it only analyzes them. To investigate, I wrote only the redBPF section of the program and output the information I retrieve (PID and executing command), and it turns out that there are packets which appear as if the probing program caused them. This is of course not the case, as the probing program is only observing the outgoing packets, not sending them. Do you have an idea what could be causing this?

This is the kernel main.rs:

#![no_std]
#![no_main]

use redbpf_probes::kprobe::prelude::*;
use kernel::test::*;
use core::mem::{self, MaybeUninit};
use core::convert::TryInto;
use core::cmp::min;

program!(0xFFFFFFFE, "GPL");

#[map]
static mut KERNEL_EVENTS: PerfMap<XmitEvent> = PerfMap::with_max_entries(2048);

#[map]
static mut BUF: PerCpuArray<XmitEvent> = PerCpuArray::with_max_entries(1);

#[kprobe("__dev_queue_xmit")]
unsafe fn dev_queue_xmit(regs: Registers) {
    let event = BUF.get_mut(0).unwrap();
    event.timestamp = bpf_ktime_get_ns();
    let pid_tgid = bpf_get_current_pid_tgid();
    event.pid = pid_tgid as u32;
    event.tgid = (pid_tgid >> 32) as u32;
    event.command = bpf_get_current_comm();

    let skb_ptr = regs.parm1() as *const u64 as *const sk_buff;
    let mut skb = MaybeUninit::<sk_buff>::uninit();
    // read skb
    if bpf_probe_read_kernel(
        skb.as_mut_ptr() as *mut _,
        mem::size_of::<sk_buff>() as u32,
        skb_ptr as *const _,
    ) < 0
    {
        bpf_trace_printk(b"error reading skb.len\0");
    }
    let skb = skb.assume_init();
    event.len = skb.len;
    event.data_len = skb.data_len;

    let readlen:u32 = min((skb.len - skb.data_len) % 128, BUFSIZE.try_into().unwrap());

    if bpf_probe_read_kernel(
        event.data.as_mut_ptr() as *mut _,
        readlen,
        skb.data as *const _,
    ) < 0
    {
        bpf_trace_printk(b"error reading skb.data\0");
    }

    event.copied_len = readlen;
    KERNEL_EVENTS.insert(regs.ctx, event);
}

This the kernel mod.rs:

pub const COMMLEN: usize = 16;
pub const BUFSIZE: usize = 128;

#[repr(C)]
#[derive(Debug, Clone)]
pub struct XmitEvent {
    pub timestamp: u64, 
    pub pid: u32,
    pub tgid: u32,
    pub len: u32,
    pub data_len: u32,
    pub copied_len: u32,
    pub command: [i8; COMMLEN],
    pub data: [u8; BUFSIZE],
}

impl Default for XmitEvent {
    fn default() -> XmitEvent {
        XmitEvent {
            timestamp: 0,
            pid: 0,
            tgid: 0,
            len: 0,
            data_len: 0,
            copied_len: 0,
            command: [0; COMMLEN],
            data: [0; BUFSIZE],
        }
    }
}

And this the userspace main.rs:

use futures::stream::StreamExt;
use kernel::test::XmitEvent;
use std::ptr;
use redbpf::load::Loader;
use tracing::Level;
use tracing_subscriber::FmtSubscriber;

fn probe_code() -> &'static [u8] {
    include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"),
    "/target/bpf/programs/test/test.elf"
    ))
}

#[tokio::main(flavor = "current_thread")]
pub async fn main() {
    let subscriber = FmtSubscriber::builder()
        .with_max_level(Level::WARN)
        .finish();
    tracing::subscriber::set_global_default(subscriber).unwrap();

    let mut loaded = Loader::load(probe_code()).expect("error on Loader::load");

    loaded
        .kprobe_mut("__dev_queue_xmit")
        .expect("error on Loaded::kprobe_mut")
        .attach_kprobe("__dev_queue_xmit", 0)
        .expect("error on KProbe::attach_kprobe");

    while let Some((map_name, events)) = loaded.events.next().await {
        if map_name == "KERNEL_EVENTS" {
            for event in events {
                let xmit_event = unsafe { ptr::read(event.as_ptr() as *const XmitEvent) };
                // convert the command to [u8] so it can be printed
                let mut hm: [u8; 16] = [0; 16];
                for (i, x) in xmit_event.command.iter().enumerate() {
                    hm[i] = *x as u8;
                }
                println!("got event with tgid {}, command {:?}", xmit_event.tgid, std::str::from_utf8(&hm));
            }
        }
    }

}

This is an excerpt from an output file when running a twitch stream with mpv:

got event with tgid 208478, command Ok("redbpf-test\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208015, command Ok("mpv/demux\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 0, command Ok("swapper/1\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208478, command Ok("redbpf-test\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208015, command Ok("mpv/demux\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208015, command Ok("mpv/demux\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208015, command Ok("mpv/demux\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 0, command Ok("swapper/1\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 0, command Ok("swapper/1\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208478, command Ok("redbpf-test\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 0, command Ok("swapper/1\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 0, command Ok("swapper/1\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 208478, command Ok("redbpf-test\u{0}\u{0}\u{0}\u{0}\u{0}")

I let the stream run for around 10 seconds before shutting it down. There are 185 lines which contain the command redbpf-test out of 4618 lines/events which were captured. It also happens when accessing websites on a browser.

I am asking this because my implementation is based on a different program (matching program written in Python, the eBPF part in C), which did not appear to have this issue.

I'd be very thankful for any ideas about what could be causing this. Thank you!

This is my system info:

rustc 1.57.0
llvm 13.0.0-6
Linux 5.15.13-arch1-1
rhdxmr commented 2 years ago

Hello @toasthorse We've met before. How are you doing?

I've heard that send syscall does not mean sending packets synchronously. send just puts the application data to some place in the kernel. And the actual packet sending is done by kernel at some point.

So the point when send syscall is called can be different from the point when __dev_queue_xmit.

This is my opinion based on my knowledge but I didn't analyze the kernel code in person since I don't have enough expertise to follow the code path from send to __dev_queue_xmit. So, my opinion might be not true.

I checked my thought by running simple busy loop program which consumes 100% CPU.

The code is very simple:

fn main() {
    loop {}
}

I compiled it as debug mode to avoid the empty loop opted-out and run it as root to set the CPU affinity. For reference, my shell is fish so its syntax is different from bash

root@fedora /h/e/s/rust# for cpu in (seq 0 (math (nproc) - 1)); taskset -c $cpu anything/target/debug/examples/busy & end
root@fedora /h/e/s/rust# jobs
Job     Group   CPU     State   Command
12      28123   96%     running taskset -c $cpu anything/target/debug/examples/busy &
11      28122   98%     running taskset -c $cpu anything/target/debug/examples/busy &
10      28121   74%     running taskset -c $cpu anything/target/debug/examples/busy &
9       28120   96%     running taskset -c $cpu anything/target/debug/examples/busy &
8       28119   98%     running taskset -c $cpu anything/target/debug/examples/busy &
7       28118   99%     running taskset -c $cpu anything/target/debug/examples/busy &
6       28117   96%     running taskset -c $cpu anything/target/debug/examples/busy &
5       28116   99%     running taskset -c $cpu anything/target/debug/examples/busy &
4       28115   99%     running taskset -c $cpu anything/target/debug/examples/busy &
3       28114   98%     running taskset -c $cpu anything/target/debug/examples/busy &
2       28113   99%     running taskset -c $cpu anything/target/debug/examples/busy &
1       28112   99%     running taskset -c $cpu anything/target/debug/examples/busy &
root@fedora /h/e/s/rust#

root@fedora /h/e/s/rust# LANG=C mpstat -P ALL 1
Linux 5.15.13-200.fc35.x86_64 (fedora)  01/14/22        _x86_64_        (12 CPU)

20:58:20     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
20:58:21     all   99.58    0.00    0.17    0.00    0.25    0.00    0.00    0.00    0.00    0.00
20:58:21       0   99.00    0.00    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00
20:58:21       1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       6   99.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       7   99.00    0.00    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00
20:58:21       8  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21       9   98.02    0.00    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00
20:58:21      10  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
20:58:21      11  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

All cpus are running 100%.

At this point, I executed your program and observed that it reports the name of my simple busy loop program.

root@fedora /h/e/redbpf (dqx) [SIGINT]# cargo run --example dqx --no-default-features --features=llvm13 |rg busy
warning: field is never read: `flags`
  --> redbpf/src/symbols.rs:78:5
   |
78 |     flags: i32,
   |     ^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` on by default

warning: `redbpf` (lib) generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/examples/dqx`
got event with tgid 33071, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33060, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33062, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33060, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")

This supports my thought that perhaps __dev_queue_xmit is called asynchronously. I guess it may be called by NIC driver, interrupt or else.

undvikar commented 2 years ago

Hi @rhdxmr,

Thank you for your reply. I am doing okay, currently quarantining but have not had any seriously bad symptoms so far. How about you?

It is correct what you said about packets being sent asynchronously. The thesis which my work is based on and this article state that __dev_queue_xmit is called to enqueue a packet which is ready to be sent in the queue of whatever network device will transmit the packet.

Thank you also for your experiment. I will have to look further into how exactly this is happening. Thank you for pointing this out!