golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.81k stars 17.65k forks source link

proposal: cmd/go: compile-time instrumentation and `-toolexec` #69887

Open RomainMuller opened 2 weeks ago

RomainMuller commented 2 weeks ago

Proposal Details

Following up on https://github.com/golang/go/issues/41145#issuecomment-2408949985, this issue aims to explain what we're doing at https://github.com/DataDog/orchestrion, why we're doing it this way, the challenges we face, and some of the toolchain evolution directions we'd like to explore.

Context

Users are embracing observability more and more, as the growing success of OpenTelemetry demonstrates; but onboarding Go applications to observability requires manually instrumenting the whole codebase, making it a tedious/expensive endeavor especially when onboarding existing large applications.

In go, everything is explicit. If some behavior happens, there is code describing it. No hidden magic, no hidden cost. This is a great property to have, but this means instrumentation can be distracting as it may take up a lot of space in the codebase (often multiple lines to start a span with several tags, and then to finish the span, possibly capturing error information & occasionally additional tags), which can result in obscuring the business logic. It also makes it easy for a developer to forget using instrumentation; resulting in observability blind spots (or in the case of security instrumentation, in leaving vulnerable code available to exploit).

Customers want to be able to onboard observability with as little friction as possible, and many prefer observability to stay out of sight when building application logic. The interest in approaches such as eBPF is a testament to the desire of having "no-code-change" observability; but these approaches are inherently limited to certain platforms, incur a performance toll that is not acceptable in certain situations, and limited (for good reasons!) in capabilities.

In addition to this, many enterprises haven't fully embraced the DevOps (and even fewer the DevSecOps) paradigm, and have different organizations take charge of developing, operating, and securing software products. There is often tension between the goals/KPIs of these different organizations (schematizing a little bit):

In effect this means developers don't want to be encumbered by observability (or security) if they can avoid it; operators and security teams want to be able to deliver their goals without getting in the way of developers. These give an advantage to instrumentation techniques that do not require code modification; as they minimize impact on developer productivity while allowing to maximize coverage.

Other use-cases for compile-time source code re-writing have recently gotten some visibility:

Orchestrion

We have built orchestrion as a tool targeted to Ops and SecOps personas, but that can also fit within a Dev/DevOps/DevSecOps workflow. It uses -toolexec to intercept all invocations to go tool compile, re-writing all the .go files to add instrumentation everywhere possible.

Orchestrion can in some ways be seen as a compile-time-woven Aspect-oriented Programming (AoP) framework; except it is currently fairly specialized to our instrumentation needs.

Challenges we had to address

Adding dependency edges

One of the consequences of re-writing source code to inject instrumentation calls is that we often need to introduce new dependencies to the package being built. A package that is built against github.com/gorilla/mux once instrumented as a new dependency edge to gopkg.in/DataDog/dd-trace-go/contrib/gorilla/mux to support the instrumentation. In most cases these new dependency edges support additional import clauses, which is ideal as it allows the compiler to type-check all parts of the re-written file. To support the compiler, Orchestrion modifies the importcfg file to register mappings between import paths and the corresponding export file, as resolved by go list (via packages.Load).

In some situations, Orchestrion needs to use //go:linkname (in the "handshake") form in order to avoid creating circular import dependencies. In those cases, the produced archive has an implied dependency on some other package; which must be made available at link-time. We embed a special text file within the produced archive to track those "link-time" dependencies, so that the requirement is persisted in GOCACHE. We then intercept invocations of go tool link and update the importcfg.link file, adding all link-time dependencies to it (again, resolved by go list via packages.Load).

Resolving additional dependencies

In order to correctly resolve the new dependency edges, we have to use packages.Load (go list), and forward the appropriate BuildFlags, such as -cover, -covermode and -coverpkg. Today, the toolchain does not provide any visibility on the full build arguments and Orchestrion has to crawl the process tree in search for a go build (or go test, go run, etc...) invocation, gather it's arguments list, and do its best at parsing it, so it can forward the right values to child compilations if needed.

This poses challenges because go's standard flags are fairly permissive in terms of style and syntax (e.g, it transparently allows adding an extra - ahead of a flag, which allows using the -flag or --flag styles interchangeably), and there is no way to know for sure what flags have a value or are simple boolean toggles.

We also need to re-implement some default argument behavior; specifically there is an implied -coverpkg argument value if one is not provided explicitly, that we need to make explicit when triggering child builds, as failure to do so typically leads to a fingerprint mismatch at link time.

In addition to this, our need to resolve additional packages during the build means we are going to trigger more compilation processes than desired (by the user); because we have no way to co-operate with the toolchain to honor the -p n flag provided by the user.

Since multiple packages may introduce the same synthetic dependency, we often end up needing to resolve the same package multiple times in parallel, which is a waste of resources. To address this, we had to implement a job server, which is a process that exists for the duration of the build, and which is responsible for resolving those dependencies, ensuring a given dependency is resolved exactly once. That process can also check that we are not introducing cyclic dependencies; and "cleanly" aborts the build if it detects a cycle, instead of endlessly recursing over itself.

Parsing tool arguments

Orchestrion also needs to parse arguments passed to compile and link commands, but there is no programmatic way to get these in a structured way. We have to resort to parsing the help output of these commands to infer what arguments have value (or don't) and hope for the best.

In the case of link there isn't much we care about except for accessing the -importcfg flag. For compile, we actually need to establish the correct list of positional arguments (they'll be the .go files) without risking a false positive (we had a bug where we simply took all arguments that ended in .go but ran into a fun issue with the github.com/nats-io/nats.go package).

Interactions with GOCACHE

Orchestrion tries to co-operate with the go toolchain as naturally as possible. One aspect of this is integrating with GOCACHE to allow our users to enjoy the performance benefits of incremental builds.

In order to do this, Orchestrion needs to influence the build ID of objects that are built with instrumentation. The only "hook point" available is intercepting the -V=full invocation of all tools, and modifying the string it returns to include an invalidation fragment.

Today, we intercept both the compile -V=full and link -V=full invocations, and append to it a hash that summarizes:

The drawback of being able to do this only at the "complete build level" is that it'll result in excessively frequent cache invalidation... Of course, changing orchestrion itself should be treated equivalent to changing compilers/toolchain, and result in complete invalidation... But the aspects & injected dependencies will only affect packages where they are effectively used, which isn't all of them.

For example, we don't do anything in the majority of the standard library (some aspects today target os and runtime), so the "regular" export files would be fine to use for all the rest.

Line information preserving

We want the instrumented code to retain the original code's line information, so that stack traces visible by the user still match the code they wrote. To do this, we sprinkle //line directives around every "synthetic" code.

The go/ast package does not provide much in the way of a facility to manipulate comments or directives, and we use github.com/dave/dst to simplify editing commentary within the source files. This however comes at a significant cost (as it performs several other costly operations we don't necessarily need done).

The injected code can also easily interfere with the debugging experience (via dlv), as the "real" source file including that instrumented code is often no longer present on disk when the program is being run... This becomes more problematic in cases when the original source file is not in "canonical" go format, as the line numbering we infer from the original file may be skewed (because github.com/dave/dst always produces canonical go format output) -- a "source map" (like is done in Node.JS, for example) would probably be somewhat easier to confidently manage.

Toolchain (-toolexec) evolution directions

Some of these may be moonshots, or things that don't fit well with the go design principles; but I'm including them anyway as they might hint at problems in search of a solution, and maybe someone comes up with a more acceptable solution:

Other related issues

ianlancetaylor commented 2 weeks ago

CC @matloob @samthanawalla