Users are embracing observability more and more, as the growing success of OpenTelemetry demonstrates; but onboarding Go applications to observability requires manually instrumenting the whole codebase, making it a tedious/expensive endeavor especially when onboarding existing large applications.
In go, everything is explicit. If some behavior happens, there is code describing it. No hidden magic, no hidden cost. This is a great property to have, but this means instrumentation can be distracting as it may take up a lot of space in the codebase (often multiple lines to start a span with several tags, and then to finish the span, possibly capturing error information & occasionally additional tags), which can result in obscuring the business logic. It also makes it easy for a developer to forget using instrumentation; resulting in observability blind spots (or in the case of security instrumentation, in leaving vulnerable code available to exploit).
Customers want to be able to onboard observability with as little friction as possible, and many prefer observability to stay out of sight when building application logic. The interest in approaches such as eBPF is a testament to the desire of having "no-code-change" observability; but these approaches are inherently limited to certain platforms, incur a performance toll that is not acceptable in certain situations, and limited (for good reasons!) in capabilities.
In addition to this, many enterprises haven't fully embraced the DevOps (and even fewer the DevSecOps) paradigm, and have different organizations take charge of developing, operating, and securing software products. There is often tension between the goals/KPIs of these different organizations (schematizing a little bit):
Developers are measured on how fast they deliver business logic
Operators are measured on how efficient and reliable the products operate
Security teams are measured on how secure the application is
In effect this means developers don't want to be encumbered by observability (or security) if they can avoid it; operators and security teams want to be able to deliver their goals without getting in the way of developers. These give an advantage to instrumentation techniques that do not require code modification; as they minimize impact on developer productivity while allowing to maximize coverage.
Other use-cases for compile-time source code re-writing have recently gotten some visibility:
We have built orchestrion as a tool targeted to Ops and SecOps personas, but that can also fit within a Dev/DevOps/DevSecOps workflow. It uses -toolexec to intercept all invocations to go tool compile, re-writing all the .go files to add instrumentation everywhere possible.
Orchestrion can in some ways be seen as a compile-time-woven Aspect-oriented Programming (AoP) framework; except it is currently fairly specialized to our instrumentation needs.
Challenges we had to address
Adding dependency edges
One of the consequences of re-writing source code to inject instrumentation calls is that we often need to introduce new dependencies to the package being built. A package that is built against github.com/gorilla/mux once instrumented as a new dependency edge to gopkg.in/DataDog/dd-trace-go/contrib/gorilla/mux to support the instrumentation. In most cases these new dependency edges support additional import clauses, which is ideal as it allows the compiler to type-check all parts of the re-written file. To support the compiler, Orchestrion modifies the importcfg file to register mappings between import paths and the corresponding export file, as resolved by go list (via packages.Load).
In some situations, Orchestrion needs to use //go:linkname (in the "handshake") form in order to avoid creating circular import dependencies. In those cases, the produced archive has an implied dependency on some other package; which must be made available at link-time. We embed a special text file within the produced archive to track those "link-time" dependencies, so that the requirement is persisted in GOCACHE. We then intercept invocations of go tool link and update the importcfg.link file, adding all link-time dependencies to it (again, resolved by go list via packages.Load).
Resolving additional dependencies
In order to correctly resolve the new dependency edges, we have to use packages.Load (go list), and forward the appropriate BuildFlags, such as -cover, -covermode and -coverpkg. Today, the toolchain does not provide any visibility on the full build arguments and Orchestrion has to crawl the process tree in search for a go build (or go test, go run, etc...) invocation, gather it's arguments list, and do its best at parsing it, so it can forward the right values to child compilations if needed.
This poses challenges because go's standard flags are fairly permissive in terms of style and syntax (e.g, it transparently allows adding an extra - ahead of a flag, which allows using the -flag or --flag styles interchangeably), and there is no way to know for sure what flags have a value or are simple boolean toggles.
We also need to re-implement some default argument behavior; specifically there is an implied -coverpkg argument value if one is not provided explicitly, that we need to make explicit when triggering child builds, as failure to do so typically leads to a fingerprint mismatch at link time.
In addition to this, our need to resolve additional packages during the build means we are going to trigger more compilation processes than desired (by the user); because we have no way to co-operate with the toolchain to honor the -p n flag provided by the user.
Since multiple packages may introduce the same synthetic dependency, we often end up needing to resolve the same package multiple times in parallel, which is a waste of resources. To address this, we had to implement a job server, which is a process that exists for the duration of the build, and which is responsible for resolving those dependencies, ensuring a given dependency is resolved exactly once. That process can also check that we are not introducing cyclic dependencies; and "cleanly" aborts the build if it detects a cycle, instead of endlessly recursing over itself.
Parsing tool arguments
Orchestrion also needs to parse arguments passed to compile and link commands, but there is no programmatic way to get these in a structured way. We have to resort to parsing the help output of these commands to infer what arguments have value (or don't) and hope for the best.
In the case of link there isn't much we care about except for accessing the -importcfg flag. For compile, we actually need to establish the correct list of positional arguments (they'll be the .go files) without risking a false positive (we had a bug where we simply took all arguments that ended in .go but ran into a fun issue with the github.com/nats-io/nats.go package).
Interactions with GOCACHE
Orchestrion tries to co-operate with the go toolchain as naturally as possible. One aspect of this is integrating with GOCACHE to allow our users to enjoy the performance benefits of incremental builds.
In order to do this, Orchestrion needs to influence the build ID of objects that are built with instrumentation. The only "hook point" available is intercepting the -V=full invocation of all tools, and modifying the string it returns to include an invalidation fragment.
Today, we intercept both the compile -V=full and link -V=full invocations, and append to it a hash that summarizes:
The version of orchestrion being used, as changing it may change how aspects are applied, which may produce different instrumentation
The complete list of aspects in use, as changing these will produce different instrumentation
The complete list of packages aspects might inject, together with their dependencies (another use of packages.Load), as a lot of these are typically not accounted for in the "natural" build ID of any object
The drawback of being able to do this only at the "complete build level" is that it'll result in excessively frequent cache invalidation... Of course, changing orchestrion itself should be treated equivalent to changing compilers/toolchain, and result in complete invalidation... But the aspects & injected dependencies will only affect packages where they are effectively used, which isn't all of them.
For example, we don't do anything in the majority of the standard library (some aspects today target os and runtime), so the "regular" export files would be fine to use for all the rest.
Line information preserving
We want the instrumented code to retain the original code's line information, so that stack traces visible by the user still match the code they wrote. To do this, we sprinkle //line directives around every "synthetic" code.
The go/ast package does not provide much in the way of a facility to manipulate comments or directives, and we use github.com/dave/dst to simplify editing commentary within the source files. This however comes at a significant cost (as it performs several other costly operations we don't necessarily need done).
The injected code can also easily interfere with the debugging experience (via dlv), as the "real" source file including that instrumented code is often no longer present on disk when the program is being run... This becomes more problematic in cases when the original source file is not in "canonical" go format, as the line numbering we infer from the original file may be skewed (because github.com/dave/dstalways produces canonical go format output) -- a "source map" (like is done in Node.JS, for example) would probably be somewhat easier to confidently manage.
Toolchain (-toolexec) evolution directions
Some of these may be moonshots, or things that don't fit well with the go design principles; but I'm including them anyway as they might hint at problems in search of a solution, and maybe someone comes up with a more acceptable solution:
Dedicated support for source-rewriting tools
Also, a simplified onboarding experience (pluggable transformers automatically discovered from go.mod dependencies?)
Ability to influence the build id on a per-package basis
Ability to add new edges to the build graph
Improved go/ast API, in particular w/r/t comment alterations
Improved source mapping (that'd make it easier to "stack" transforms, e.g cover + instrumentation)
Allow -toolexec to (try to) respect the -p n flag of the overarching build
Share overarching build flags with -toolexec, or a way for a -toolexec tool to request extra build artifacts from the overarching build (see also, adding new edges to the build graph)
Proposal Details
Following up on https://github.com/golang/go/issues/41145#issuecomment-2408949985, this issue aims to explain what we're doing at https://github.com/DataDog/orchestrion, why we're doing it this way, the challenges we face, and some of the toolchain evolution directions we'd like to explore.
Context
Users are embracing observability more and more, as the growing success of OpenTelemetry demonstrates; but onboarding Go applications to observability requires manually instrumenting the whole codebase, making it a tedious/expensive endeavor especially when onboarding existing large applications.
In go, everything is explicit. If some behavior happens, there is code describing it. No hidden magic, no hidden cost. This is a great property to have, but this means instrumentation can be distracting as it may take up a lot of space in the codebase (often multiple lines to start a span with several tags, and then to finish the span, possibly capturing error information & occasionally additional tags), which can result in obscuring the business logic. It also makes it easy for a developer to forget using instrumentation; resulting in observability blind spots (or in the case of security instrumentation, in leaving vulnerable code available to exploit).
Customers want to be able to onboard observability with as little friction as possible, and many prefer observability to stay out of sight when building application logic. The interest in approaches such as eBPF is a testament to the desire of having "no-code-change" observability; but these approaches are inherently limited to certain platforms, incur a performance toll that is not acceptable in certain situations, and limited (for good reasons!) in capabilities.
In addition to this, many enterprises haven't fully embraced the DevOps (and even fewer the DevSecOps) paradigm, and have different organizations take charge of developing, operating, and securing software products. There is often tension between the goals/KPIs of these different organizations (schematizing a little bit):
In effect this means developers don't want to be encumbered by observability (or security) if they can avoid it; operators and security teams want to be able to deliver their goals without getting in the way of developers. These give an advantage to instrumentation techniques that do not require code modification; as they minimize impact on developer productivity while allowing to maximize coverage.
Other use-cases for compile-time source code re-writing have recently gotten some visibility:
Orchestrion
We have built orchestrion as a tool targeted to Ops and SecOps personas, but that can also fit within a Dev/DevOps/DevSecOps workflow. It uses
-toolexec
to intercept all invocations togo tool compile
, re-writing all the.go
files to add instrumentation everywhere possible.Orchestrion can in some ways be seen as a compile-time-woven Aspect-oriented Programming (AoP) framework; except it is currently fairly specialized to our instrumentation needs.
Challenges we had to address
Adding dependency edges
One of the consequences of re-writing source code to inject instrumentation calls is that we often need to introduce new dependencies to the package being built. A package that is built against
github.com/gorilla/mux
once instrumented as a new dependency edge togopkg.in/DataDog/dd-trace-go/contrib/gorilla/mux
to support the instrumentation. In most cases these new dependency edges support additionalimport
clauses, which is ideal as it allows the compiler to type-check all parts of the re-written file. To support the compiler, Orchestrion modifies theimportcfg
file to register mappings between import paths and the corresponding export file, as resolved bygo list
(viapackages.Load
).In some situations, Orchestrion needs to use
//go:linkname
(in the "handshake") form in order to avoid creating circular import dependencies. In those cases, the produced archive has an implied dependency on some other package; which must be made available at link-time. We embed a special text file within the produced archive to track those "link-time" dependencies, so that the requirement is persisted inGOCACHE
. We then intercept invocations ofgo tool link
and update theimportcfg.link
file, adding all link-time dependencies to it (again, resolved bygo list
viapackages.Load
).Resolving additional dependencies
In order to correctly resolve the new dependency edges, we have to use
packages.Load
(go list
), and forward the appropriateBuildFlags
, such as-cover
,-covermode
and-coverpkg
. Today, the toolchain does not provide any visibility on the full build arguments and Orchestrion has to crawl the process tree in search for ago build
(orgo test
,go run
, etc...) invocation, gather it's arguments list, and do its best at parsing it, so it can forward the right values to child compilations if needed.This poses challenges because
go
's standard flags are fairly permissive in terms of style and syntax (e.g, it transparently allows adding an extra-
ahead of a flag, which allows using the-flag
or--flag
styles interchangeably), and there is no way to know for sure what flags have a value or are simple boolean toggles.We also need to re-implement some default argument behavior; specifically there is an implied
-coverpkg
argument value if one is not provided explicitly, that we need to make explicit when triggering child builds, as failure to do so typically leads to a fingerprint mismatch at link time.In addition to this, our need to resolve additional packages during the build means we are going to trigger more compilation processes than desired (by the user); because we have no way to co-operate with the toolchain to honor the
-p n
flag provided by the user.Since multiple packages may introduce the same synthetic dependency, we often end up needing to resolve the same package multiple times in parallel, which is a waste of resources. To address this, we had to implement a job server, which is a process that exists for the duration of the build, and which is responsible for resolving those dependencies, ensuring a given dependency is resolved exactly once. That process can also check that we are not introducing cyclic dependencies; and "cleanly" aborts the build if it detects a cycle, instead of endlessly recursing over itself.
Parsing tool arguments
Orchestrion also needs to parse arguments passed to
compile
andlink
commands, but there is no programmatic way to get these in a structured way. We have to resort to parsing the help output of these commands to infer what arguments have value (or don't) and hope for the best.In the case of
link
there isn't much we care about except for accessing the-importcfg
flag. Forcompile
, we actually need to establish the correct list of positional arguments (they'll be the.go
files) without risking a false positive (we had a bug where we simply took all arguments that ended in.go
but ran into a fun issue with thegithub.com/nats-io/nats.go
package).Interactions with
GOCACHE
Orchestrion tries to co-operate with the
go
toolchain as naturally as possible. One aspect of this is integrating withGOCACHE
to allow our users to enjoy the performance benefits of incremental builds.In order to do this, Orchestrion needs to influence the build ID of objects that are built with instrumentation. The only "hook point" available is intercepting the
-V=full
invocation of all tools, and modifying the string it returns to include an invalidation fragment.Today, we intercept both the
compile -V=full
andlink -V=full
invocations, and append to it a hash that summarizes:orchestrion
being used, as changing it may change how aspects are applied, which may produce different instrumentationpackages.Load
), as a lot of these are typically not accounted for in the "natural" build ID of any objectThe drawback of being able to do this only at the "complete build level" is that it'll result in excessively frequent cache invalidation... Of course, changing
orchestrion
itself should be treated equivalent to changing compilers/toolchain, and result in complete invalidation... But the aspects & injected dependencies will only affect packages where they are effectively used, which isn't all of them.For example, we don't do anything in the majority of the standard library (some aspects today target
os
andruntime
), so the "regular" export files would be fine to use for all the rest.Line information preserving
We want the instrumented code to retain the original code's line information, so that stack traces visible by the user still match the code they wrote. To do this, we sprinkle
//line
directives around every "synthetic" code.The
go/ast
package does not provide much in the way of a facility to manipulate comments or directives, and we usegithub.com/dave/dst
to simplify editing commentary within the source files. This however comes at a significant cost (as it performs several other costly operations we don't necessarily need done).The injected code can also easily interfere with the debugging experience (via
dlv
), as the "real" source file including that instrumented code is often no longer present on disk when the program is being run... This becomes more problematic in cases when the original source file is not in "canonical" go format, as the line numbering we infer from the original file may be skewed (becausegithub.com/dave/dst
always produces canonical go format output) -- a "source map" (like is done in Node.JS, for example) would probably be somewhat easier to confidently manage.Toolchain (
-toolexec
) evolution directionsSome of these may be moonshots, or things that don't fit well with the go design principles; but I'm including them anyway as they might hint at problems in search of a solution, and maybe someone comes up with a more acceptable solution:
go.mod
dependencies?)go/ast
API, in particular w/r/t comment alterationscover
+ instrumentation)-toolexec
to (try to) respect the-p n
flag of the overarching build-toolexec
, or a way for a-toolexec
tool to request extra build artifacts from the overarching build (see also, adding new edges to the build graph)Other related issues