eunomia-bpf / bpftime

Userspace eBPF runtime for Observability, Network & General Extensions Framework
https://eunomia.dev/bpftime/
MIT License
801 stars 75 forks source link

[FEATURE] USDT support #135

Open caizixian opened 9 months ago

caizixian commented 9 months ago

In addition to current u(ret)probe support, it would be useful to support USDT.

Describe the solution you'd like

Support libbpf and bpftrace's usdt

Describe alternatives you've considered

Not quite sure. Maybe the current uprobe support can support usdt as well?

Provide usage examples

Trace usdt (such as the DTrace probes in OpenJDK https://github.com/openjdk/jdk/blob/e44276989fc6358065412be7567d0141c84f1282/src/hotspot/os/posix/dtrace/hotspot.d#L4)

Officeyutong commented 9 months ago

UDST trace points are defined in ELF executables, so there might be an implementation similar to uprobe, with frida or inline hook. I'll try to make a POC in recent days

Officeyutong commented 9 months ago

On x86_64, a usdt tracepoint was corresponded to a nop instruction in the code. So we may hook this instruction to implement userspace USDT I've tested with GumInvocationListener of frida gum, it works well when attached to the address of the nop instruction.

So things we need to add USDT support are:

Other logics would be similar to uprobe, and can even be reused

caizixian commented 9 months ago

Yes. You can find the USDTs in an ELF file via readelf -n.

The insertion of tracepoints is language/compiler toolchain dependent. For Rust, I use probe, which implements USDT using inline asm https://github.com/cuviper/probe-rs/blob/884250f0013de5de56b6252d5bcb5dbb7919d049/src/platform/systemtap.rs#L84

caizixian commented 9 months ago

I think if we use multi-byte nop, we can patch it to a jmp (using techniques similar to Global Offset Table or something), and perhaps we can perhaps perform userspace tracing entirely in process.

The workflow is something like:

  1. A process with an embedded bpftime starts.
  2. Call into bpftime to load some BPF programs (can be BPF object files, or be compiled from bpftrace-style scripts). The BPF programs can be JITted or interpreted.
  3. Call into bpftime to parse the ELF file and patch all the nop tracepoints into JMPs (to interpreter shims or to JITted code).
Officeyutong commented 9 months ago

I think if we use multi-byte nop, we can patch it to a jmp (using techniques similar to Global Offset Table or something), and perhaps we can perhaps perform userspace tracing entirely in process.

The workflow is something like:

  1. A process with an embedded bpftime starts.
  2. Call into bpftime to load some BPF programs (can be BPF object files, or be compiled from bpftrace-style scripts). The BPF programs can be JITted or interpreted.
  3. Call into bpftime to parse the ELF file and patch all the nop tracepoints into JMPs (to interpreter shims or to JITted code).

Things now come to be more simpler. Implementation of USDT in eBPF is totally indentical to uprobe. See https://github.com/eunomia-bpf/bpftime/pull/139 for more details and track the development progress

caizixian commented 9 months ago

Thanks a lot for implementing this feature!

yunwei37 commented 9 months ago

Could you please help us test whether bpftrave is working with USDT? @caizixian

caizixian commented 8 months ago

Could you please help us test whether bpftrave is working with USDT? @caizixian

Will do when I have time.

caizixian commented 8 months ago

Yep. I did some simple testing and it seems to work fine. Just need some minor workaround #182

sudo ./build/tools/cli-cpp/bpftime load /opt/bpftrace/bpftrace-0.19.1-static -e "usdt:/tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so:mmtk:work {printf(\"Test\");}"
[2024-01-28 03:48:55.984] [info] [syscall_context.hpp:84] manager constructed
[2024-01-28 03:48:56.439] [info] [syscall_server_utils.cpp:24] Initialize syscall server
[2024-01-28 03:48:56][info][1577978] Global shm constructed. shm_open_type 0 for bpftime_maps_shm
[2024-01-28 03:48:56][info][1577978] Enabling helper groups ffi, kernel, shm_map by default
[2024-01-28 03:48:56][info][1577978] bpftime-syscall-server started
Attaching 2 probes...
[2024-01-28 03:48:56][error][1577978] bpftime only supports attach type BPF_PERF_EVENT
[2024-01-28 03:48:56][info][1577978] Created uprobe/uretprobe perf event handler, module name /tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so, offset 1b4f11
[2024-01-28 03:48:56][info][1577978] Created uprobe/uretprobe perf event handler, module name /tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so, offset 1b35f1

...

INFO [1577978]: Global shm destructed

Is the error message benign?

Officeyutong commented 8 months ago

Yep. I did some simple testing and it seems to work fine. Just need some minor workaround #182

sudo ./build/tools/cli-cpp/bpftime load /opt/bpftrace/bpftrace-0.19.1-static -e "usdt:/tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so:mmtk:work {printf(\"Test\");}"
[2024-01-28 03:48:55.984] [info] [syscall_context.hpp:84] manager constructed
[2024-01-28 03:48:56.439] [info] [syscall_server_utils.cpp:24] Initialize syscall server
[2024-01-28 03:48:56][info][1577978] Global shm constructed. shm_open_type 0 for bpftime_maps_shm
[2024-01-28 03:48:56][info][1577978] Enabling helper groups ffi, kernel, shm_map by default
[2024-01-28 03:48:56][info][1577978] bpftime-syscall-server started
Attaching 2 probes...
[2024-01-28 03:48:56][error][1577978] bpftime only supports attach type BPF_PERF_EVENT
[2024-01-28 03:48:56][info][1577978] Created uprobe/uretprobe perf event handler, module name /tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so, offset 1b4f11
[2024-01-28 03:48:56][info][1577978] Created uprobe/uretprobe perf event handler, module name /tmp/jdk-11.0.19/lib/server/libmmtk_openjdk.so, offset 1b35f1

...

INFO [1577978]: Global shm destructed

Is the error message benign?

bpftime only supports attach type BPF_PERF_EVENT would be printed when trying to call BPF_LINK_CREATE with an attach type that is not BPF_PERF_EVENT. Currently, we only support link to perf event (All links of uprobe, uretprobe, usdt, and syscall trace are attached to perf event). So this error might indicate that bpftrace was creating other bpf links. If any other things work fine, this error might be benign for this use case. But it would be better if you could provide the bpftrace program you use and other test assets so I could check what caused the error

yunwei37 commented 8 months ago

I think that's a bug. I found the same problem when trying to attach bpftime to xdp programs today. It should be fixed soon.

We should make sure the bpftime load command can record all the bpf links so we can attach to multiple different userspace targets.