eunomia-bpf / bpftime

Userspace eBPF runtime for fast Uprobe & Syscall hook & Extensions with LLVM JIT
https://eunomia.dev/bpftime/
MIT License
699 stars 70 forks source link

[FEATURE] Refactor bpftime for Enhanced Extensibility and Maintainability #202

Open yunwei37 opened 6 months ago

yunwei37 commented 6 months ago

Issue Summary:

The current bpftime architecture intermingles code for different eBPF program types and backends, such as uprobe and syscall tracing, within the syscall-server.so. For example:

  1. The attach_ctx class contains the uprobe and syscall code, and also maintain the state of eBPF virtual machines. https://github.com/eunomia-bpf/bpftime/blob/204f1154b8a0689a91307b87a9f4a41435077765/runtime/include/bpf_attach_ctx.hpp#L67-L77
  2. The share memory also contains uprobe and syscall related attach infomation, which is calculated in the bpftime load time with the syscall-server.so. For instance: https://github.com/eunomia-bpf/bpftime/blob/d850554af7418f66991fff9be2363df22e2d450b/runtime/syscall-server/syscall_context.cpp#L459-L470 . It may be better to have some API that allow the backend (Attach target) to control what kind of perf event should be mocked in userspace, and how to attach centain eBPF progs to these events (eg. XDP in userspace DPDK, nginx modules, plugins).

This design limits the addition of new attach backends and eBPF program types, and also complicates the codebase. A refactor is proposed to address these issues and set the stage for future enhancements.

Proposed Changes:

  1. Decouple Syscall Server Responsibilities:

    • Restrict syscall-server.so and the daemon to only handle recording syscall traces and states, such as the creation of progs, maps, and links within the kernel.
    • Remove mixed responsibilities and allow for the loading of other eBPF program types.
  2. Split Attach Context:

    • Separate attach context class into two distinct targets, runtime and attach_events
    • The runtime should offer two types of APIs:
      • An API for initializing and manage all progs and maps.
      • An API for customizing attach-related information processing and operations.
    • The developer can inherit attach_events class to implement their own event sources.
  3. Temporary Feature Development Freeze:

    • Before the completion of the refactor, pause the addition of new features to avoid further complicating the current state.
  4. Future-Proofing and Extensibility:

    • The refactor will facilitate the replacement of components like Frida, embedding runtime statically in other applications, and expanding to new domains such as GPU tracing, XDP and https://github.com/eunomia-bpf/bpftime/issues/158
    • Future modifications will be isolated to the attach context, simplifying updates and maintenance.

Rationale:

This refactor addresses fundamental design flaws that were not evident in the initial conception of bpftime. It aims to simplify the current codebase and prepare for more stable and scalable expansion. Although this entails a significant overhaul, it is manageable given the current code volume.

Next Steps:

Call for Input:

We welcome input from the community on this proposed refactor. Any insights or suggestions, especially regarding the decoupling of components and API design, would be highly appreciated.

Officeyutong commented 6 months ago

Replacing of frida is already able through the attach manager (by subclassing)

Officeyutong commented 6 months ago

Split attach manager into a seperate target

205

Attach manager is relatively independent with runtime, it doesn't invoke any API of runtime. It only provide uprobe/uretprobe implementation to runtime (bpf_attach_ctx, by now). Besides, attach manager is the only part in bpftime that has strong dependency on Frida.

So if we split attach manager into a seperate target, we might gain:

yunwei37 commented 6 months ago

What about attach_ctx class?

Is it possible to implement all uprobe/syscall related code outside of runtime target and runtime dir? If we can do that, is it a better solution?

And also, there are some attach related code in syscall transformer and syscall-server.so

Officeyutong commented 6 months ago

What about attach_ctx class?

Is it possible to implement all uprobe/syscall related code outside of runtime target and runtime dir? If we can do that, is it a better solution?

And also, there are some attach related code in syscall transformer and syscall-server.so

The implementation of syscall trace is not compatible with attach manager(or let's call it uprobe attach manager)

So I think we might split syscall trace attach implementation into another target, just call it syscall trace attach manager. Syscall trace callbacks should be registered to this target, and this target should provide a dispatch entry(The function that text transformer would call, when syscall was captured). And rename the current attach manager to uprobe attach manager, since it's only responsible for uprobes

yunwei37 commented 6 months ago

So Maybe we can create two targets?

The attach is a new dir under the project root.

And the user can specify which one to compile with the runtime.

yunwei37 commented 6 months ago

Shall we come up with some better name for Uprobe attach manager? Maybe attach_ctx back or attach_target, attach_events?

Some background:

So maybe we can have a design like this:

  1. One attach_manager for managing all attach_ctx or attach_targets. The attached manager can have the unique ptr ownership of the runtime.
  2. attach_ctx has a base class and some sub classes. For example, uprobe_attach_ctx class and syscalls_attach_ctx. The attach_ctx should be able to access the shared memory through the runtime API, and also config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel.

These codes are all in the attach dir in project root.

Another problem is, how can we make the attach_targets config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel? If we don't want to hardcode it in the -server.so like what we did now.

Officeyutong commented 5 months ago

So Maybe we can create two targets?

  • /attach/uprobe
  • /attach/syscalls

The attach is a new dir under the project root.

And the user can specify which one to compile with the runtime.

Sounds good. More attach implementation could be added in the future

Officeyutong commented 5 months ago

Shall we come up with some better name for Uprobe attach manager? Maybe attach_ctx back or attach_target, attach_events?

Some background:

  • The eBPF runtime can be embedded in a shared memory, or compile and link with other applications as extensions. The runtime is responsible for load and manage the eBPF programs in the process.
  • There can be multiple eBPF attach methods at the same time, for example, uprobe and syscalls tracepoints Should be able to work together.
  • one eBPF program can be attached to multiple targets or events, one event can have multiple eBPF programs attached to it.

So maybe we can have a design like this:

  1. One attach_manager for managing all attach_ctx or attach_targets. The attached manager can have the unique ptr ownership of the runtime.
  2. attach_ctx has a base class and some sub classes. For example, uprobe_attach_ctx class and syscalls_attach_ctx. The attach_ctx should be able to access the shared memory through the runtime API, and also config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel.

These codes are all in the attach dir in project root.

Another problem is, how can we make the attach_targets config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel? If we don't want to hardcode it in the -server.so like what we did now.

This also soulds good. But the uprobe attach manager I mentioned above is only some classes that provide API to register a callback at a certain function. It has nothing to do with any eBPF stuff. Maybe a name like uprobe_attach_impl is more suitable for this part of code?

Officeyutong commented 5 months ago

Shall we come up with some better name for Uprobe attach manager? Maybe attach_ctx back or attach_target, attach_events? Some background:

  • The eBPF runtime can be embedded in a shared memory, or compile and link with other applications as extensions. The runtime is responsible for load and manage the eBPF programs in the process.
  • There can be multiple eBPF attach methods at the same time, for example, uprobe and syscalls tracepoints Should be able to work together.
  • one eBPF program can be attached to multiple targets or events, one event can have multiple eBPF programs attached to it.

So maybe we can have a design like this:

  1. One attach_manager for managing all attach_ctx or attach_targets. The attached manager can have the unique ptr ownership of the runtime.
  2. attach_ctx has a base class and some sub classes. For example, uprobe_attach_ctx class and syscalls_attach_ctx. The attach_ctx should be able to access the shared memory through the runtime API, and also config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel.

These codes are all in the attach dir in project root. Another problem is, how can we make the attach_targets config what kinds of perf events or attached targets can be mocked in userspace, while others can be passed into the kernel? If we don't want to hardcode it in the -server.so like what we did now.

This also soulds good. But the uprobe attach manager I mentioned above is only some classes that provide API to register a callback at a certain function. It has nothing to do with any eBPF stuff. Maybe a name like uprobe_attach_impl is more suitable for this part of code?

And the attach manager you mentioned seems to be something that is responsible for "resolving perf event (or other equivalent), and allowing a certain event to call a certain ebpf program". Did I mis-understand what you said? If not, I think this thing is more suitable for the name attach manager, and should be split into individual targets.

But from another perspective, I still think uprobe_attach_impl should be split into an individual target. It has little dependency to other parts of bpftime. Splitting it into an individual can make the code base clearer, and would make it more convenient for other users that only want to use the uprobe implementation by us

yunwei37 commented 5 months ago

Yes, uprobe_attach_impl should be split into an individual target.

Can you describe the full dependency and classes inheritance of all the modules/cmake targets you think may be correct?

I think it could be something like

  1. attach_manager has the ownership of all the attach_impl and has the ownership of runtime. It's built into an 'object' target in cmake, and has the runtime as dependence.
  2. The attach_impl based class is in a header. The uprobe_attach_impl will inherit this and also be built into a standalone target.
  3. The agent.so will depend on these attach_impl targets.
yunwei37 commented 5 months ago

And also, can we add the new attach event at load time? So it's not statically compiled.

For example, we have three kinds agent.so:

  1. one is compiled with uprobe and syscalls tracepoints enabled.
  2. the second is compiled only with uprobe supported,
  3. the third one is used statically in the application, like the nginx module or xdp in dpdk.

We can let the agents or user config what functionality of syscalls it wants to mock in the syscall server.so. For example, allow some perf events syscall and bpf link types to be mock or response in the syscall server.so, while others not.

The config can be stored in the shared memory. So the new attached targets can register it.

Officeyutong commented 5 months ago

Yes, uprobe_attach_impl should be split into an individual target.

Can you describe the full dependency and classes inheritance of all the modules/cmake targets you think may be correct?

I think it could be something like

  1. attach_manager has the ownership of all the attach_impl and has the ownership of runtime. It's built into an 'object' target in cmake, and has the runtime as dependence.
  2. The attach_impl based class is in a header. The uprobe_attach_impl will inherit this and also be built into a standalone target.
  3. The agent.so will depend on these attach_impl targets.

Yes, uprobe_attach_impl should be split into an individual target.

Can you describe the full dependency and classes inheritance of all the modules/cmake targets you think may be correct?

I think it could be something like

  1. attach_manager has the ownership of all the attach_impl and has the ownership of runtime. It's built into an 'object' target in cmake, and has the runtime as dependence.
  2. The attach_impl based class is in a header. The uprobe_attach_impl will inherit this and also be built into a standalone target.
  3. The agent.so will depend on these attach_impl targets.

What does ownership of runtime means? Is it something that holds all ownerships of compiled ebpf programs? (The ownership of maps are remained in the shm, and isn't held by anything, I think)

Officeyutong commented 5 months ago

And also, can we add the new attach event at load time? So it's not statically compiled.

For example, we have three kinds agent.so:

  1. one is compiled with uprobe and syscalls tracepoints enabled.
  2. the second is compiled only with uprobe supported,
  3. the third one is used statically in the application, like the nginx module or xdp in dpdk.

We can let the agents or user config what functionality of syscalls it wants to mock in the syscall server.so. For example, allow some perf events syscall and bpf link types to be mock or response in the syscall server.so, while others not.

The config can be stored in the shared memory. So the new attached targets can register it.

This sounds good

yunwei37 commented 5 months ago

What does ownership of runtime means? Is it something that holds all ownerships of compiled ebpf programs? (The ownership of maps are remained in the shm, and isn't held by anything, I think)

have a unique ptr in the code, and responsible for managing the open and close of the maps, compile and load the progs.

Officeyutong commented 5 months ago

What does ownership of runtime means? Is it something that holds all ownerships of compiled ebpf programs? (The ownership of maps are remained in the shm, and isn't held by anything, I think)

have a unique ptr in the code, and responsible for managing the open and close of the maps, compile and load the progs.

Maps and programs are held in the shared memory, and may live longer than agent or syscall server. So maybe their ownership should not be limited by bpftime runtime?

Officeyutong commented 5 months ago

What does ownership of runtime means? Is it something that holds all ownerships of compiled ebpf programs? (The ownership of maps are remained in the shm, and isn't held by anything, I think)

have a unique ptr in the code, and responsible for managing the open and close of the maps, compile and load the progs.

Maps and programs are held in the shared memory, and may live longer than agent or syscall server. So maybe their ownership should not be limited by bpftime runtime?

"ownership" here means "stuff in the heap memory of a certain process that is required to operate shared memory". For example, the class bpftime_shm itself

Officeyutong commented 5 months ago

Targets

yunwei37 commented 5 months ago

Is it better that you can first come up with a small example of how to use the new api to implement a new eBPF attach type (e.g. nginx module eBPF)?

Officeyutong commented 5 months ago

Is it better that you can first come up with a small example of how to use the new api to implement a new eBPF attach type (e.g. nginx module eBPF)?

OK, I'll take it

Officeyutong commented 5 months ago

Refer to https://github.com/eunomia-bpf/bpftime-new-api-poc for detailed POC

Other notes:

Officeyutong commented 4 months ago

Instantiation of handler