bpftrace / bpftrace

High-level tracing language for Linux
Apache License 2.0
8.45k stars 1.32k forks source link

join(args.argv) prints nothing #3384

Open zmitchell opened 1 month ago

zmitchell commented 1 month ago

What reproduces the bug? Provide code if possible.

I used Flox to install bpftrace (this is using Nix under the hood). Running execsnoop.bt on a sleep 1 call does not print argv. I discovered this writing my own script that was instrumenting the sys_enter_exec* family of tracepoints:

flox [bpf_scripts] zmitchell@ubuntu-vm:~/bpftrace_tools$ sudo -E env "PATH=$PATH" bpftrace $FLOX_ENV/bin/execsnoop.bt
Attaching 3 probes...
TIME            PID     PPID    ARGS
22:56:09.466102 750961  624277
^C

The call I was trying to instrument above was:

$(command -v sleep) 1

and running the same thing via strace shows the correct argv:

execve("/nix/store/zzmqcgd2vrql9656a2fkwk948r38mgai-coreutils-9.3/bin/sleep", ["sleep", "1"], 0xffffffff6738 /* 138 vars */) = 0

bpftrace --info output

System
  OS: Linux 6.5.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan  9 22:39:36 UTC 2024
  Arch: aarch64

Build
  version: v0.21.2
  LLVM: 18.1.8
  unsafe probe: no
  bfd: yes
  liblldb (DWARF support): no
  libsystemd (systemd notify support): no

Kernel helpers
  probe_read: yes
  probe_read_str: yes
  probe_read_user: yes
  probe_read_user_str: yes
  probe_read_kernel: yes
  probe_read_kernel_str: yes
  get_current_cgroup_id: yes
  send_signal: yes
  override_return: yes
  get_boot_ns: yes
  dpath: yes
  skboutput: yes
  get_tai_ns: yes
  get_func_ip: yes
  jiffies64: yes
  for_each_map_elem: yes

Kernel features
  Instruction limit: 1000000
  Loop support: yes
  btf: yes
  module btf: yes
  map batch: yes
  uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes

Map types
  hash: yes
  percpu hash: yes
  array: yes
  percpu array: yes
  stack_trace: yes
  perf_event_array: yes
  ringbuf: yes

Probe types
  kprobe: yes
  tracepoint: yes
  perf_event: yes
  kfunc: yes
  kprobe_multi: no
  uprobe_multi: no
  raw_tp_special: yes
  iter: yes
danobi commented 1 month ago

Hi, thanks for the report!

Do you mind building from source and trying again on your machine? Or trying the appimage: https://github.com/bpftrace/bpftrace/blob/master/INSTALL.md#appimage-install

I can't reproduce this on msater on my machine. So let's first try to establish if it's your host or your package.

zmitchell commented 1 month ago

I did a fresh nix build and I have the same behavior.

danobi commented 1 month ago

Interesting. Can you try running with -kk flag and share the output? I expect there should be a lot of warnings about -EFAULT errors - that's expected. But curious if there's any other ones.

zmitchell commented 1 month ago

Here's tracing another sleep 1 call

Attaching 7 probes...
FORK: ts=3706270739,parent_pid=478,child_pid=55679,parent_ppid=1,parent_pgid=478,parent_comm=systemd-journal
FORK: ts=3706474253,parent_pid=478,child_pid=55680,parent_ppid=1,parent_pgid=478,parent_comm=systemd-journal
EXIT: ts=3715865207,pid=478,ppid=1,pgid=478,comm=journal-offline
EXIT: ts=3716330895,pid=478,ppid=1,pgid=478,comm=journal-offline
FORK: ts=2908988330,parent_pid=14499,child_pid=55681,parent_ppid=1375,parent_pgid=14499,parent_comm=bash
SETPGID: ts=2909098107,pid=14499,ppid=1375,pgid=14499,new_pgid=0,comm=bash
SETPGID: ts=2909249045,pid=55681,ppid=14499,pgid=55681,new_pgid=0,comm=bash
EXEC: ts=2909440103,pid=55681,ppid=14499,pgid=55681,comm=bash,process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~
process_events.bt:8:2-18: WARNING: Bad address
Additional Info - helper: probe_read_user_str, retcode: -14
    join(args->argv);
    ~~~~~~~~~~~~~~~~

And here's the script that I'm running:


BEGIN {}

tracepoint:syscalls:sys_enter_exec*
{
    $task = (struct task_struct *)curtask;
    printf("EXEC: ts=%u,pid=%d,ppid=%d,pgid=%d,comm=%s,", nsecs, pid, $task->real_parent->pid, $task->group_leader->pid, comm);
    join(args->argv);
}

tracepoint:sched:sched_process_exit
{
    $task = (struct task_struct *)curtask;
    printf("EXIT: ts=%u,pid=%d,ppid=%d,pgid=%d,comm=%s\n", nsecs, pid, $task->real_parent->pid, $task->group_leader->pid,comm);
}

tracepoint:sched:sched_process_fork
{
    $task = (struct task_struct *)curtask;
    printf("FORK: ts=%u,parent_pid=%d,child_pid=%d,parent_ppid=%d,parent_pgid=%d,parent_comm=%s\n", nsecs, args.parent_pid, args.child_pid, $task->real_parent->pid, $task->group_leader->pid, comm);
}

uretprobe:libc:setsid
{
    $task = (struct task_struct *)curtask;
    $session = retval;
    printf("SETSID: ts=%u,pid=%d,ppid=%d,pgid=%d,sid=%d,comm=%s\n", nsecs, pid, $task->real_parent->pid, $task->group_leader->pid,$session,comm);
}

uretprobe:libc:setpgid
{
    $task = (struct task_struct *)curtask;
    $pgid = retval;
    printf("SETPGID: ts=%u,pid=%d,ppid=%d,pgid=%d,new_pgid=%d,comm=%s\n", nsecs, pid, $task->real_parent->pid, $task->group_leader->pid,$pgid,comm);
}

P.S. - sometimes I get multiple exits per process this way and sometimes I get a fork without an exit even though ps shows that a process is no longer running (and has been cleaned up). Any idea what that's about?

zmitchell commented 1 month ago

I just realized that the description talks about args.argv and the script above uses args->argv, so I double checked and I get the same error.

danobi commented 1 month ago

Can you try with this diff?

diff --git a/tools/execsnoop.bt b/tools/execsnoop.bt
index d79027d3..1deaed26 100755
--- a/tools/execsnoop.bt
+++ b/tools/execsnoop.bt
@@ -28,5 +28,6 @@ tracepoint:syscalls:sys_enter_exec*
 {
        $task = (struct task_struct *)curtask;
        printf("%15s %-7d %-7d ", strftime("%H:%M:%S.%f", nsecs), pid, $task->real_parent->pid);
-       join(args.argv);
+       //join(args.argv);
+       print(str(args.argv[0]));
 }

Keep the -kk flag there. Now no errors are expected. If you see any -14 (-EFAULT) errors, it means the args probably aren't faulted in yet. See #2005.

danobi commented 1 month ago

P.S. - sometimes I get multiple exits per process this way and sometimes I get a fork without an exit even though ps shows that a process is no longer running (and has been cleaned up). Any idea what that's about?

The kernel treats each thread (from userland PoV) as a task. The main thread (process as you might call it) is not super special. Check that you're not looking at events on threads. See https://github.com/bpftrace/bpftrace/issues/2195#issuecomment-2079956311

zmitchell commented 1 month ago

Can you try with this diff?

Here we go:

Attaching 7 probes...
FORK: ts=2448559448,parent_pid=14499,child_pid=56731,parent_ppid=1375,parent_pgid=14499,parent_comm=bash
SETPGID: ts=2448675952,pid=14499,ppid=1375,pgid=14499,new_pgid=0,comm=bash
SETPGID: ts=2449043175,pid=56731,ppid=14499,pgid=56731,new_pgid=0,comm=bash
EXEC: ts=2449119220,pid=56731,ppid=14499,pgid=56731,comm=bash,sleep
EXIT: ts=3451848566,pid=56731,ppid=14499,pgid=56731,comm=sleep

This now prints the sleep at the end of the EXEC line.

Check that you're not looking at events on threads.

That made a big difference, thanks. I'm still seeing a bunch of forks with no exit though. I see that when I'm using either t:sched:sched_process_exit or t:syscalls:sys_enter_exit* as the source for those EXIT lines. I also realized I was using t:sched:sched_process_fork to detect forks, but I'm suspecting that isn't correct because fork is a libc function, and the scheduler is in the kernel.

danobi commented 1 month ago

(Traveling today so on phone)

On Wed, Aug 7, 2024, at 7:21 PM, Zach Mitchell wrote:

Can you try with this diff?

Here we go:

Attaching 7 probes... FORK: ts=2448559448,parent_pid=14499,child_pid=56731,parent_ppid=1375,parent_pgid=14499,parent_comm=bash SETPGID: ts=2448675952,pid=14499,ppid=1375,pgid=14499,new_pgid=0,comm=bash SETPGID: ts=2449043175,pid=56731,ppid=14499,pgid=56731,new_pgid=0,comm=bash EXEC: ts=2449119220,pid=56731,ppid=14499,pgid=56731,comm=bash,sleep EXIT: ts=3451848566,pid=56731,ppid=14499,pgid=56731,comm=sleep This now prints the sleep at the end of the EXEC line.

No errors from -kk anymore?

Check that you're not looking at events on threads.

That made a big difference, thanks. I'm still seeing a bunch of forks with no exit though. I see that when I'm using either t:sched:sched_process_exit or t:syscalls:sys_enter_exit* as the source for those EXIT lines. I also realized I was using t:sched:sched_process_fork to detect forks, but I'm suspecting that isn't correct because fork is a libc function, and the scheduler is in the kernel.

Yeah theres a lot going on for process management. I’m not familiar enough to say for sure you’re missing something, but I’d more likely suspect there’s some feature/syscall or code path we’re not considering

— Reply to this email directly, view it on GitHub https://github.com/bpftrace/bpftrace/issues/3384#issuecomment-2274506419, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGJKJ5TQFN3SRWU4RIMK7DZQKTVZAVCNFSM6AAAAABMDHUWWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUGUYDMNBRHE. You are receiving this because you commented.Message ID: @.***>

-- Daniel Xu @.***

zmitchell commented 1 month ago

No errors from -kk.

danobi commented 1 month ago

Interesting. I suppose you at least have a workaround for now. But it's odd join() doesn't work. TBH I don't think join() is even that useful - we have enough language primitives for users to just reimplement it.

zmitchell commented 1 month ago

Is there a way to get the number of args? I haven't found that yet.

zmitchell commented 1 month ago

And what's the real difference between args and args.argv?

danobi commented 1 month ago

Is there a way to get the number of args? I haven't found that yet.

My guess is the first NULL in args.argv means there are no more arguments. So you'd have to create a loop (which is supported in bpftrace now) and just go until there's a NULL. join() is kinda bad cuz it doesn't do any null checks. It just eats errors and hopes things work out.

And what's the real difference between args and args.argv?

args is a bpftrace construct. See https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#builtins

args.argv references the argv tracepoint field. For example there are other fields as well.

$ sudo bpftrace -lv t:syscalls:sys_enter_execve
tracepoint:syscalls:sys_enter_execve
    int __syscall_nr
    const char * filename
    const char *const * argv
    const char *const * envp
zmitchell commented 4 weeks ago

Just to close the loop on the fork/exit thing, it turned out that the forks that never exited were threads, as I suspected. I ended up needing to instrument clone and clone3 instead, filtering on the CLONE_THREAD flag (which says to put the new task into the same thread group).