Open undvikar opened 2 years ago
Hello @toasthorse We've met before. How are you doing?
I've heard that send
syscall does not mean sending packets synchronously.
send
just puts the application data to some place in the kernel. And the
actual packet sending is done by kernel at some point.
So the point when send
syscall is called can be different from the point when
__dev_queue_xmit
.
This is my opinion based on my knowledge but I didn't analyze the kernel code
in person since I don't have enough expertise to follow the code path from
send
to __dev_queue_xmit
. So, my opinion might be not true.
I checked my thought by running simple busy loop program which consumes 100% CPU.
The code is very simple:
fn main() {
loop {}
}
I compiled it as debug mode to avoid the empty loop opted-out and run it as
root to set the CPU affinity.
For reference, my shell is fish
so its syntax is different from bash
root@fedora /h/e/s/rust# for cpu in (seq 0 (math (nproc) - 1)); taskset -c $cpu anything/target/debug/examples/busy & end
root@fedora /h/e/s/rust# jobs
Job Group CPU State Command
12 28123 96% running taskset -c $cpu anything/target/debug/examples/busy &
11 28122 98% running taskset -c $cpu anything/target/debug/examples/busy &
10 28121 74% running taskset -c $cpu anything/target/debug/examples/busy &
9 28120 96% running taskset -c $cpu anything/target/debug/examples/busy &
8 28119 98% running taskset -c $cpu anything/target/debug/examples/busy &
7 28118 99% running taskset -c $cpu anything/target/debug/examples/busy &
6 28117 96% running taskset -c $cpu anything/target/debug/examples/busy &
5 28116 99% running taskset -c $cpu anything/target/debug/examples/busy &
4 28115 99% running taskset -c $cpu anything/target/debug/examples/busy &
3 28114 98% running taskset -c $cpu anything/target/debug/examples/busy &
2 28113 99% running taskset -c $cpu anything/target/debug/examples/busy &
1 28112 99% running taskset -c $cpu anything/target/debug/examples/busy &
root@fedora /h/e/s/rust#
root@fedora /h/e/s/rust# LANG=C mpstat -P ALL 1
Linux 5.15.13-200.fc35.x86_64 (fedora) 01/14/22 _x86_64_ (12 CPU)
20:58:20 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
20:58:21 all 99.58 0.00 0.17 0.00 0.25 0.00 0.00 0.00 0.00 0.00
20:58:21 0 99.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
20:58:21 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 6 99.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 7 99.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00
20:58:21 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 9 98.02 0.00 0.99 0.00 0.99 0.00 0.00 0.00 0.00 0.00
20:58:21 10 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
20:58:21 11 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All cpus are running 100%.
At this point, I executed your program and observed that it reports the name of my simple busy loop program.
root@fedora /h/e/redbpf (dqx) [SIGINT]# cargo run --example dqx --no-default-features --features=llvm13 |rg busy
warning: field is never read: `flags`
--> redbpf/src/symbols.rs:78:5
|
78 | flags: i32,
| ^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `redbpf` (lib) generated 1 warning
Finished dev [unoptimized + debuginfo] target(s) in 0.18s
Running `target/debug/examples/dqx`
got event with tgid 33071, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33060, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33062, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33063, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33060, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
got event with tgid 33061, command Ok("busy\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}")
This supports my thought that perhaps __dev_queue_xmit
is called
asynchronously. I guess it may be called by NIC driver, interrupt or else.
Hi @rhdxmr,
Thank you for your reply. I am doing okay, currently quarantining but have not had any seriously bad symptoms so far. How about you?
It is correct what you said about packets being sent asynchronously. The thesis which my work is based on and this article state that __dev_queue_xmit
is called to enqueue a packet which is ready to be sent in the queue of whatever network device will transmit the packet.
Thank you also for your experiment. I will have to look further into how exactly this is happening. Thank you for pointing this out!
Hello,
I have a question related to my previous issue #220. I wrote a program with the purpose of matching network packets to PIDs. For this, I use a
kprobe
redBPF program to retrieve packet information from the kernel function__dev_queue_xmit
. This information is then forwarded to the userspace program where the matching takes place. However, I noticed that quite a few packets are matched to the PID of the matching program itself, which I thought was odd because the matching program does not send any packets itself, it only analyzes them. To investigate, I wrote only the redBPF section of the program and output the information I retrieve (PID and executing command), and it turns out that there are packets which appear as if the probing program caused them. This is of course not the case, as the probing program is only observing the outgoing packets, not sending them. Do you have an idea what could be causing this?This is the kernel
main.rs
:This the kernel
mod.rs
:And this the userspace
main.rs
:This is an excerpt from an output file when running a twitch stream with mpv:
I let the stream run for around 10 seconds before shutting it down. There are 185 lines which contain the command
redbpf-test
out of 4618 lines/events which were captured. It also happens when accessing websites on a browser.I am asking this because my implementation is based on a different program (matching program written in Python, the eBPF part in C), which did not appear to have this issue.
I'd be very thankful for any ideas about what could be causing this. Thank you!
This is my system info: