aya-rs / aya

Aya is an eBPF library for the Rust programming language, built with a focus on developer experience and operability.
https://aya-rs.dev/book/
Apache License 2.0
2.94k stars 258 forks source link

Bug on system suspend #932

Open qjerome opened 3 months ago

qjerome commented 3 months ago

Linux Kernel: 6.6.25-1-lts

I came across a very strange issue, which may not be caused by Aya but may be solvable in Aya.

Issue description: I noticed that when a kretprobe is attached to __sys_recvmsg and the system is suspended (in RAM) the probe stops working when the system resumes. It is yet impossible for me to explain this behavior.

Steps to reproduce:

  1. implement such a probe
    #[kretprobe(function = "__sys_recvmsg")]
    pub fn net_dns_exit_sys_recvmsg(ctx: ProbeContext) -> u32 {
    unsafe {
        bpf_printk!(b"entering kretprobe(__sys_recvmsg)");
    };
    0
    }
  2. Load it and attach it to __sys_recvmsg kernel function (___sys_recvmsg also suffers from the same issue)
  3. Run sudo bpftool prog profile tag $PROG_TAG duration 5 cycles and observe the result, you should see non null values.
  4. Suspend the system with systemctl suspend
  5. Resume the system
  6. Re-run sudo bpftool prog profile tag $PROG_TAG duration 5 cycles, you should see all zeros (that's what I see) even though the program is still alive.
qjerome commented 3 months ago

Similar behavior has been observed for

Linux Kernel: 6.6.28-lts Attach point: ___sys_recvmsg, ____sys_recvmsg and sock_recvmsg

I spent several hours trying to find more about this issue but I really have no inspiration to find where this comes from exactly ... I really want to find what happens here but lacking inspiration ! IDK maybe @dave-tucker or @alessandrod have ideas ?

I am currently thinking that to fix this I should actually add a probe to system resume and actually reload the program on resume ! But it would just fix the symptom, not the root cause.

cc: @vadorovsky

qjerome commented 3 months ago

Minimal repro code for the issue:

1) sudo bpftrace -e 'kretprobe:__sys_recvmsg { printf("%s\n", comm); }' 2) systemctl suspend 3) Resume 4) see that nothing gets printed anymore on stdout

qjerome commented 3 months ago

The issue is very likely located in Linux Kernel, an bug report as been filled and can be tracked at https://bugzilla.kernel.org/show_bug.cgi?id=218775