aquasecurity / tracee

Linux Runtime Security and Forensics using eBPF
https://aquasecurity.github.io/tracee/latest
Apache License 2.0
3.64k stars 419 forks source link

refactor tracee-ebpf (Go) #960

Closed itaysk closed 2 years ago

itaysk commented 3 years ago

today, most of tracee-ebpf is two huge packages: main and tracee. split tracee-ebpf's Go code into multiple packages. each package is self-contained and testable. think hard about the names and APIs of those packages.

Some ideas:

  1. create new package tracing instead of the current tracee which will contain the core tracee-ebpf engine
  2. rename the struct tracee to tracer

main package:

  1. flags handling - stay in package main for now. but extract from main.go - #1248
  2. ebpf compilations - remove #1239
  3. check capabilities utility function - create package capabilities -
  4. printer.go - create new package eventprinter. Factory function moves to main.go - #1266

tracee package:

  1. tracee package needs to eventually go away, broken down into other packages
  2. external package needs to eventually go away, the structs will be moved to respective packages
  3. buckets cache - create new package bucketscache - #1249
  4. stats - create new package stats. external.Stats will move to this package - #1250
  5. configurations - create package config that includes filter, captureConfig ,outputConfig and related functions. (also create an interface for config and implement validators) - #1264
  6. argprinters.go - move to eventprinter package (see above). remove dependency on tracee (can be a static function). replace parse misnomer with print (as the file name suggests). - #1265
  7. containers - create new package containers. The code is already independent of tracee -
  8. create package events - #1268
    1. events_decoder.go - create new package events. rename to events_decoding.go
    2. events_processor.go - create new package events. rename to event_processing.go. CopyFileByPath function should move to some utlis package or duplicated.
    3. events.go - create new package events processEventsshould be minimal as possible, there is some code here that should be inprocessEvent`.
  9. filters.go - create new package tracingfilters (consider the usage of pointers vs values in Filter) - #1269

TBD - tarcee.go and consts.go

related: #934, #789, #1131, #1098

danielpacak commented 2 years ago

This is also a good opportunity to move main.go of tracee-ebpf to ./cmd/tracee-ebpf/main.go as described in #1217

itaysk commented 2 years ago

@mtcherni95 perhaps a good start is to review the existing code and discuss here which new packages you would like to create?

simar7 commented 2 years ago

@mtcherni95 - I briefly looked into this, I think we'll need to add some tests in place before we can safely consider refactoring this part of the code. As of today, there is very little coverage for tracee.go and main.go.

My proposed plan of action is to break some of the hard to test pieces of the above mentioned files into logical units (functions) that are easier to test on their own. Furthermore, there is a sanity level of tests that exists today to verify some pieces https://github.com/aquasecurity/tracee/blob/main/tests/tracee_test.go - we definitely need more of such cases to verify all the business logic that will get refactored will be functionally correct.

Let us know what are your thoughts on this. Happy to discuss and chart a plan forward.

mtcherni95 commented 2 years ago

I identified these main semantic components concerning the tracee-ebpf engine, which from now on I will just be calling it "Tracer":

My plan was to create under pkg/tracer a package for each of the previously stated items. The actual implementation of the engine would then be under pkg/tracer/tracer.go

Moreover, I believe would be convenient to create under pkg the following packages:

The idea would be that eventually also tracee-rules could use the flags and (if needed) the stats packages. Every single package will be then easily "unit-testable". Of course, things might slightly change with the development, however I believe this is a pretty solid base line.

Let me know what you guys think.

itaysk commented 2 years ago

where do you think makeBPFObject and all of the accompanying code should reside (some parts of the makefile are re-implemented in Go)? I think a another package like makebpf

itaysk commented 2 years ago

what are we doing with package external? and also I think the issue #1098 should also be included in this (moving the eventId tables to package external)

itaysk commented 2 years ago

also, can you please elaborate on what is within each package from the list you shared (except Buckets Cache) which is self-describing?

mtcherni95 commented 2 years ago

also, can you please elaborate on what is within each package from the list you shared (except Buckets Cache) which is self-describing?

itaysk commented 2 years ago

@mtcherni95 and I have spent some time to review the current code and make notes on packages. We didn't have time to review the entire code, so I'm pasting our incomplete notes. The goal is to create a list of actionable tasks that can be completed individually (Work on this cannot start before #1217 closes and we have the basic project structure).

main package: main.go contains these parts:

  1. cli interface - obviously stays untouched.
  2. flags handling - stay in package main for now. but split in separate files: flags-filter.go, flags-output.go, flags-capture.go - #1248
  3. ebpf compilations - create new package bpfbuilder - #1239
  4. check capabilities utility function - create package capabilities -

printer.go - create new package eventprinter. Factory function moves to main.go

tracee package:

  1. need to eventually go away, broken down into packages
  2. external package will eventually go away, the structs will be moved to respective packages
  3. buckets cache - create new package bucketscache - #1249
  4. stats - create new package stats. external.Stats will move to this package - #1250
  5. configurations - create package config that includes filter, captureConfig ,outputConfig and related functions. (in the future maybe create an interface for config and implement validators)
  6. argprinter - move to eventprinter package (see above). remove dependency on tracee (can be a static function). replace parse misnomer with print (as the file name suggests).
rafaeldtinoco commented 2 years ago
  • ebpf compilations - create new package bpfbuilder

@yanivagman had some comments about removing the builder logic and letting users to use "builder container images" to have non-CORE built.

Here it is: https://github.com/aquasecurity/tracee/issues/1239

simar7 commented 2 years ago

@mtcherni95 and I have spent some time to review the current code and make notes on packages. We didn't have time to review the entire code, so I'm pasting our incomplete notes. The goal is to create a list of actionable tasks that can be completed individually (Work on this cannot start before #1217 closes and we have the basic project structure).

main package: main.go contains these parts:

  1. cli interface - obviously stays untouched.
  2. flags handling - stay in package main for now. but split in separate files: flags-filter.go, flags-output.go, flags-capture.go
  3. ebpf compilations - create new package bpfbuilder
  4. check capabilities utility function - create package capabilities

The split up into different go source files seems reasonable to me but splitting up into a plethora of packages is something I'm trying to wrap my head around. I believe there is a use case to decouple the above mentioned point 3 and point 4 into their own respective packages because the logic they contain is re-used somewhere else? If so is the case, where are those call sites?

Not necessarily saying I disagree with this but would like to better understand how we can leverage splitting up into new packages the best.

  1. ebpf compilations - create new package bpfbuilder
  2. check capabilities utility function - create package capabilities

These two are the ones that can certainly benefit from the accept interfaces, return structs paradigm. This would help us test those cases were a real elevated system call is necessary to exercise certain behaviours of the function.

printer.go - create new package eventprinter. Factory function moves to main.go

tracee package:

  1. need to eventually go away, broken down into packages
  2. external package will eventually go away, the structs will be moved to respective packages

+1. I've always seen external as another utils, which IMO is an anti-pattern as anything that gets put in there doesn't -really belong anywhere else.

itaysk commented 2 years ago

I'll start to create issues for the obvious packages, and we can discuss the other ones in a call

mtcherni95 commented 2 years ago

The mindset we had in creating this proposal is that not only the project itself will use the previously stated new packages, but also a third party which is adopting Tracee as an engine and use it as a library. For instance, the capabilities package could be a classic example for it: Tracee will provide the relevant exported function to check if the hypothetical application of a user has enough capabilities to load and run the Tracee BPF program. Same thing could be said about the bpfbuilder and such.

itaysk commented 2 years ago

Continuing from https://github.com/aquasecurity/tracee/issues/960#issuecomment-992654757 we continued to review the tracee package.

2 high level recommendations:

  1. create new package tracing instead of the current tracee which will contain the core tracee-ebpf engine
  2. rename the struct tracee to tracer

@yanivagman already did most of the work in https://github.com/aquasecurity/tracee/pull/934, where he created files for the different pieces of functionality. there are just minor improvements to make:

  1. containers - create new package containers. The code is already independent of tracee
  2. events_decoder.go - rename to events_decoding.go
  3. events_processor.go - rename to event_processing.go. CopyFileByPath function should move to some utlis package or duplicated.
  4. events.go - processEvents should be minimal as possible, there is some code here that should be in processEvent.
  5. filters.go - create new package tracingfilters (consider the usage of pointers vs values in Filter)

We still haven't looked into tarcee.go and consts.go which is probably the heavier work

yanivagman commented 2 years ago

2 high level recommendations:

create new package tracing instead of the current tracee which will contain the core tracee-ebpf engine rename the struct tracee to tracer

Keeping in mind that we will want to add enforcement capabilities in the future, I think we should wait with using a name such as tracing for the package and tracee for the struct. Having an additional "enforcing" package will have a lot in common with the suggested "tracing" package, and I'm not sure it will be the best choice to (try to) decouple the logic of these two. We can reconsider this after extracting the "obvious" packages out of tracee.go.

yanivagman commented 2 years ago

Rethinking about some of the packages suggested here, I agree with @simar7 that

The split up into different go source files seems reasonable to me but splitting up into a plethora of packages is something I'm trying to wrap my head around.

If not for code reuse, why do we want to create packages if we already have the logic split into different files?

  1. I don't see a good reason to have "filters" package, as it has no meaning outside of the "tracee" package - The filters are tightly coupled with the bpf maps defined and initialized in the tracee packge. Having the filters.go file is already a good separation of the logic, and can probably stay in the same package with other files.
  2. Same claim for "containers"- how will we benefit from having it in a different package?

Before creating any new package, I think we should ask ourselves - can this package be used in different places? Is the logic in the package self contained? Will it be enough to just keep it in a separate file? Only after we have a good reason for creating a new package, we should start with its implementation.

So please, anyone that takes one of the new opened issues for extracting a package - please explain first why you think we should create this package

simar7 commented 2 years ago

Rethinking about some of the packages suggested here, I agree with @simar7 that

The split up into different go source files seems reasonable to me but splitting up into a plethora of packages is something I'm trying to wrap my head around.

If not for code reuse, why do we want to create packages if we already have the logic split into different files?

  1. I don't see a good reason to have "filters" package, as it has no meaning outside of the "tracee" package - The filters are tightly coupled with the bpf maps defined and initialized in the tracee packge. Having the filters.go file is already a good separation of the logic, and can probably stay in the same package with other files.

  2. Same claim for "containers"- how will we benefit from having it in a different package?

Before creating any new package, I think we should ask ourselves - can this package be used in different places? Is the logic in the package self contained? Will it be enough to just keep it in a separate file?

Only after we have a good reason for creating a new package, we should start with its implementation.

So please, anyone that takes one of the new opened issues for extracting a package - please explain first why you think we should create this package

Just to add from a call I had with @itaysk: We need to first decide if the "packages" we're talking about here will be "real packages" that is, placed under something like pkg directory and importable externally.

Or if we just want them to be logical units of code within the Tracee project that is, available under something like internal and not available to anyone outside. The Go compiler ensures that this is the case for all internal packages.

IMO, once something is declared externally available once, a public contract is established and we must be careful in honouring that contract going forwards. So unless there's a strong need for such packages to be externally available and used (hence my question about callsites) it might be worth thinking about it again.

itaysk commented 2 years ago

tracing for the package and tracee for the struct

it's tracing and tracer (not tracee)

why do we want to create packages if we already have the logic split into different files?

IMO - encapsulation (ignore the OOP-specific details, the concept is still relevant here). having all the code and all the functions and all the variables in one "namespace" is messy, and can lead to spaghetti code. If we can achieve the same goal with files, or with structs, or something else, that's fine too I think.

Another reason is to reduce the complexity surface exposed to external consumers, or to allow to consume specific components when needed instead of having to import one huge namespace of objects.

yanivagman commented 2 years ago

tracing for the package and tracee for the struct

it's tracing and tracer (not tracee)

Yes, just a typo of mine

why do we want to create packages if we already have the logic split into different files?

IMO - encapsulation (ignore the OOP-specific details, the concept is still relevant here). having all the code and all the functions and all the variables in one "namespace" is messy, and can lead to spaghetti code. If we can achieve the same goal with files, or with structs, or something else, that's fine too I think.

Another reason is to reduce the complexity surface exposed to external consumers, or to allow to consume specific components when needed instead of having to import one huge namespace of objects.

Do we want "filters" and "containers" packages to be exposed to external consumers? Or is it just the configuration of those (for which I agree we should have a config package)

itaysk commented 2 years ago

I don't know and I agree with what's been said that it should be the decided/discussed when actually working on the issue. it's ok to close the issue with just moving the code into a file in the same package. but just to clarify, package doesn't necessarily mean external. it might make sense to package some code in a package (and as simar suggests make it internal)

AsafEitani commented 2 years ago

I was having issues compiling tracee using the containers with the Makefile. While running the make -f builder/Makefile.tracee-make alpine-make ARG="clean all" I receive an error: the input device is not a TTY

I believe that the issue is related to the Makefile refactor.