Open danielwhite opened 3 years ago
Hello @danielwhite,
Thanks a lot for letting us know about this issue.
I guess you are using go mod vendor
and this is indeed due to AppSec being integrated into the tracer since the latest v1.34.0 release. This feature is opt-in via a build tag appsec
but go mod vendor
doesn't take them into account, so you still end up vendoring it (discussed in https://github.com/golang/go/issues/25873). But note the binary size isn't impacted unless you enable AppSec via the appsec
build tag.
This also has knock-on effects to consumers of datadog-lambda-go.
Could you please elaborate on this?
Since AppSec is in its private beta phase, this is the kind of feedback we want to gather to take decisions on our next steps and implementing our WAF in Go is one of them.
Shorter term, what I propose you is:
vendor/gopkg.in/DataDog/dd-trace-go.v1/internal/appsec/waf
unless you use or want to use AppSec.Thanks for the consideration.
I guess you are using
go mod vendor
and this is indeed due to AppSec being integrated into the tracer since the latest v1.34.0 release.
Yes. I find vendoring to be extremely useful for hermetic builds and auditing dependencies.
This feature is opt-in via a build tag
appsec
butgo mod vendor
doesn't take them into account, so you still end up vendoring it (discussed in golang/go#25873). But note the binary size isn't impacted unless you enable AppSec via theappsec
build tag.
I'm not certain that build tags are well suited for this use case. They share a global namespace, so tend to be better suited for "global" things (e.g. GOOS, GOARCH). Admittedly, appengine
seems like an outlier, but describes a runtime target for the binary.
In this case, supporting opt-in in via the package import system might play more nicely with the ecosystem. There seems to be precedent already with how contrib
provides optional features.
Having dug a bit deeper into #1007 and how the tracer fits together, I appreciate the trickiness of this particular problem, so YMMV.
This also has knock-on effects to consumers of datadog-lambda-go.
Could you please elaborate on this?
Apologies. I should have been more precise.
datadog-lambda-go
is useful in contexts without HTTP requests (i.e. event workers) so it seems a shame to impose this unnecessarily.
Since AppSec is in its private beta phase, this is the kind of feedback we want to gather to take decisions on our next steps and implementing our WAF in Go is one of them.
Shorter term, what I propose you is:
- Manually remove this vendored directory
vendor/gopkg.in/DataDog/dd-trace-go.v1/internal/appsec/waf
unless you use or want to use AppSec.
This is probably the best short-term option. We could add a .gitignore
to exclude these binaries.
It does require playing a bit of whack-a-mole, as we'd need to be aware of this everywhere that vendors this package.
- For the next release to come, we will strip everything we can from these archive files. For example, it currently includes the debugging information and when removing them, the linux archive for example ends up being 3.1MB (so ~10x smaller). Would this size be acceptable for you?
As libddwaf
is updated, this will contribute to bloat the git repository. The smaller size slows this down, but doesn't really address the problem.
Longer term, how are you considering support for other architectures/platforms (e.g. arm64)?
Longer term, how are you considering support for other architectures/platforms (e.g. arm64)?
Moving this on top as it's off topic: it's basically a matter of compiling it for the target (arm64 being already in the build artefacts of https://github.com/DataDog/libddwaf/releases) and copying the resulting static lib in this repo as you saw.
Yes. I find vendoring to be extremely useful for hermetic builds and auditing dependencies.
Which is a good reason for vendoring indeed. But note that since go 1.16, go.mod/go.sum files are now read-only and you could get this same "hermetic builds and auditing dependencies" property without vendoring anymore (since the go.mod and go.sum are read-only).
In this case, supporting opt-in in via the package import system might play more nicely with the ecosystem. There seems to be precedent already with how
contrib
provides optional features.
FTR, the reason why appsec is not a separate public package (like the profiler) comes from a product requirement: existing APM tracer users should have nothing more to do to enable it. We are also aware it means we need to pay extra attention and care to what we add to the tracer dependency graph.
datadog-lambda-go
is useful in contexts without HTTP requests (i.e. event workers) so it seems a shame to impose this unnecessarily.
go build
& co already correctly support pruning unused dependencies when compiling. Only go mod vendor
doesn't. Lambda users and their compiled programs are not affected by appsec unless they enable it.
- For the next release to come, we will strip everything we can from these archive files. For example, it currently includes the debugging information and when removing them, the linux archive for example ends up being 3.1MB (so ~10x smaller). Would this size be acceptable for you?
As
libddwaf
is updated, this will contribute to bloat the git repository. The smaller size slows this down, but doesn't really address the problem.
I start to think the only definitive answer to this problem is an improvement of go mod vendor
so that it allows you to vendor only what will be compiled in the end. You certainly have many other packages you don't need but still vendor today.
Maybe go list
can help listing what's not required so that you could prune your vendor directory automatically?
I'm keeping this issue open until we take a final decision as we currently need more feedback. Here are the options that could further improve the vendoring experience:
In the meantime, as you probably saw, the next release will now have the stripped libraries that are 10x smaller (see https://github.com/DataDog/dd-trace-go/pull/1056).
Some updates on this issue: we decided we should remove our dependency to CGO mainly to drastically simplify our installation process. It is low priority for now but I should start evaluating the alternatives in the next few weeks.
Thanks for the update @Julio-Guerra - appreciated 👍
This has recently become worse due in 1.15.0 due to go:embed
now requiring that those binaries exist at build time.
Has any further consideration been given to this?
Yeah this getting worse now. It's a real shame the build flag of datadog.no_waf
is used to remove the WAF components and the embedded binary - ideally it should be the inverse to include the WAF if a customer sees the benefit in doing so.
I only noted this build flag was possible after hitting this wall once again, as noted by @danielwhite.
I'm sure there are many users of this package/Datadog customers blissfully unaware this binary is being bundled in their binaries without warning - at the very least this should be called out in the README.md
.
FTR, the reason why appsec is not a separate public package (like the profiler) comes from a product requirement: existing APM tracer users should have nothing more to do to enable it
The problem of this embedded binary is getting really nasty. @Julio-Guerra what is driving this product requirement? We have to opt into using certain parts of Datadog, orchestrate the codebase etc? Why would appsec be different in that regard?
Digging into this more @Julio-Guerra - this PR feels like to be the current pain we're having with this: https://github.com/DataDog/dd-trace-go/pull/2354
This PR has changed the behaviour of the appsec
package - with the removal of the following build tags from this package:
//go:build appsec
// +build appsec
So with this change - it's now changed the appsec
package from being "opt in" - to "opt out" - a major behaviour change to how this package operates. And I'm sure a surprise to many consumers of the github.com/DataDog/dd-trace-go
package.
Can we please consider re-including these build tags - if people want appsec
- they can include the previous build tags - rather than expecting all others to work around this WAF binary embedding?
It's a shame Datadog aren't taking this concern - as raised by @danielwhite more seriously.
As a defeat, we've done the following (might be of use to others reading this issue):
replace github.com/DataDog/go-libddwaf/v2 => ...
directive in go.mod
it swap out use of the offending go-libddwaf
with our fork.datadog.no_waf
to be:
!datadog.no_waf
-> datadog.no_waf
.datadog.no_waf
-> !datadog.no_waf
.Would be great if Datadog could focus some attention on this issue - as of right now, using github.com/DataDog/datadog-lambda-go
means a binary must bring in:
github.com/DataDog/datadog-go/v5 (fair enough)
gopkg.in/DataDog/dd-trace-go.v1 (fair enough)
>> appsec components of above package
github.com/DataDog/go-libddwaf/v2
It would be great if the use of github.com/DataDog/datadog-lambda-go
could be more modular - allowing people to compose the parts they need - rather than Datadog just deciding to "bolt-on" more and more "features" at each release as a surprise.
From all the above - I find the fact that an embedded WAF binary in the form of go-libddwaf
is now included by default - very strange. If you're building Go binaries to be used as AWS Lambda functions - these functions, if called from HTTP endpoints will be behind either Function URLs or API Gateway endpoints - both cases where the use of a WAF internal to the invoked Lambda feels somewhat odd?
Additionally, if people did lean heavily into the Golang and AWS Lambda eco-system - I'd probably want to be aware of these design decisions if I was still at a phase of picking between Datadog vs. other options for my monitoring/logging SaaS decision - as (for me personally) I find stuff like this pretty important.
While tracking the latest release, I've noticed that https://github.com/DataDog/dd-trace-go/pull/1007/ has resulted in
libddwaf.a
is being included when vendoringdd-trace-go
.It would be nice to avoid incurring the cost of an additional ~35MB of unused binaries on downstream repositories. I assume these will continue to change over time, too, so this will continue to accrue. This also has knock-on effects to consumers of
datadog-lambda-go
.Could this feature be more optional?