anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.29k stars 578 forks source link

Version parsing regression for Go binaries #3086

Closed luhring closed 3 months ago

luhring commented 3 months ago

What happened:

Coming from Syft v1.9.0 to v1.10.0, Syft changed how it parses versions for Go binary main modules: cases where where it used to surface v1.2.3 (mapping exactly to the module's version) now show as 1.2.3, which is less correct. Ideally we'd keep the correct strings in place, and avoid introducing more change than necessary to existing behavior.

Note that non-main modules appear to still parse correctly (with v if the module itself has v).

What you expected to happen:

No change from v1.9.0 — previous behavior was correct here. 🙏

Steps to reproduce the issue:

(using zsh or similar)

$ syft -q =(curl -sL https://packages.wolfi.dev/os/aarch64/thanos-0.35.1-r3.apk | tar -Oxz usr/bin/thanos)
...
github.com/thanos-io/thanos                                                          0.35.1                                     go-module
github.com/themihai/gomemcache                                                       v0.0.0-20180902122335-24332e2d58ab         go-module
github.com/tklauser/go-sysconf                                                       v0.3.10                                    go-module
github.com/tklauser/numcpus                                                          v0.4.0                                     go-module
...

Here, github.com/thanos-io/thanos is the main module. And it doesn't have any versions/tags of 0.35.1 — see https://github.com/thanos-io/thanos/tags.

Anything else we need to know?:

Some discussion in the community Slack before filing this: https://anchorecommunity.slack.com/archives/C019BUXV7R6/p1722433767717669

Environment:

Application: syft
Version:    1.10.0
BuildDate:  2024-07-30T16:13:26Z
GitCommit:  a4b5dcd0df80f6a58c8610e25104647710c1da5d
GitDescription: v1.10.0
Platform:   darwin/arm64
GoVersion:  go1.22.5
Compiler:   gc
spiffcs commented 3 months ago

👋 Thanks @luhring - I think this regression is actually a case where the previous behavior you were seeing was a happy accident. This however does lead to more questions.

Here is the build data I extracted from thanos-io

ldflags = {string} "-X github.com/prometheus/common/version.Version=0.35.1 -X
github.com/prometheus/common/version.Revision=086a698b2195adb6f3463ebbd032e780f39d2050 -X
github.com/prometheus/common/version.Branch=HEAD -X
github.com/prometheus/common/version.BuildUser=root@7cdd991973d2 -X
github.com/prometheus/common/version.BuildDate=20240703-04:53:15  -extldflags '-static'"
mainModule = {string} "github.com/thanos-io/thanos"

In a previous version, syft was keying off of github.com/prometheus/common/version.Version=0.35.1 and associating it to thanos.

If thanos is keeping their versioning inline with prometheus that is just another coincidence/convention syft was not determining itself, but instead relying on.

When I examine the previous versions execution path there was not a point where we would check if the main module matches the versioned module passed to the ldflags.

This is the PR that fixed this oversight and made sure to check we were associating versions correctly: https://github.com/anchore/syft/pull/3062

You can see in the old execution path, before the latest version, a prefix of v is required before returning: https://github.com/anchore/syft/blob/a4b5dcd0df80f6a58c8610e25104647710c1da5d/syft/pkg/cataloger/golang/parse_go_binary.go#L275-L286

In the new code path, we no longer return in the block where we checked for a prefix because of the mismatch between main module and module passed in ldflags.

Instead, it looks like we now rely on our regex capture groups checking the contents of the binary: https://github.com/anchore/syft/blob/a4b5dcd0df80f6a58c8610e25104647710c1da5d/syft/pkg/cataloger/golang/parse_go_binary.go#L217-L227

Here we return the same answer, but this time without guaranteeing a prefix of v like the other block.

Proposed Fix

I think the initial fix is to first move all the concerns of a v prefix for all matched versions to the caller. Probably here after we've determined version != "" https://github.com/anchore/syft/blob/a4b5dcd0df80f6a58c8610e25104647710c1da5d/syft/pkg/cataloger/golang/parse_go_binary.go#L180-L187

This will keep us consistent and move us back to the behavior previously seen. We should not allow syft to return a version associated with a golang module unless it's prefixed with v.

Happy to hear your thoughts though on the discovery of thanos's version association/linkage with prometheus.

Do you think PR #3062 is correct?

Is the correct behavior for this scenario syft failing to extract a version for github.com/thanos-io/thanos given that we're matching on pretty flimsy evidence with regard to the binary content extraction? https://github.com/anchore/syft/blob/a4b5dcd0df80f6a58c8610e25104647710c1da5d/syft/pkg/cataloger/golang/parse_go_binary.go#L217-L227