RedHatProductSecurity / deplist

Apache License 2.0
3 stars 5 forks source link

Stop using `go list` #32

Open kaovilai opened 1 year ago

kaovilai commented 1 year ago

Current go list command is inaccurate dep and lead to many false positives.

Since a go 1.17 go.mod file includes a require directive for every dependency needed to build any package or test in that module, the pruned module graph includes all of the dependencies needed to go build or go test the packages in any dependency explicitly required by the main module. A module that is not needed to build any package or test in a given module cannot affect the run-time behavior of its packages, so the dependencies that are pruned out of the module graph would only cause interference between otherwise-unrelated modules.

Modules whose requirements have been pruned out still appear in the module graph and are still reported by go list -m all: their selected versions are known and well-defined, and packages can be loaded from those modules (for example, as transitive dependencies of tests loaded from other modules). However, since the go command cannot easily identify which dependencies of these modules are satisfied, the arguments to go build and go test cannot include packages from modules whose requirements have been pruned out. go get promotes the module containing each named package to an explicit dependency, allowing go build or go test to be invoked on that package.

reference emphasis mine.

Sources: https://github.com/dependabot/dependabot-core/issues/4740#issuecomment-1458532057

https://github.com/anchore/syft do not inherit this behavior and is preferable that we move to it.

Please remove usage of go list from this repo.

kaovilai commented 1 year ago

Recent false positive https://issues.redhat.com/browse/OADP-2990

kaovilai commented 1 year ago

https://github.com/RedHatProductSecurity/deplist/blob/main/internal/scan/golang.go

sfowl commented 1 year ago

@kaovilai Thanks for raising an issue, but I admit I'm a bit confused by this one.

The deplist tool does not use go list -m all, as shown by the source code link you've provided, for very similar reasons as noted in the dependabot issue, i.e. traversing the module graph is prone to false positives. In my experience, traversal of the package graph (e.g. via go list -deps ./...) is generally more focused and leads to fewer false positives, which is the approach used in deplist.

In the jira link you've provided for the false positive, it's not clear to me how it was determined deplist was even used, I don't see any indication it was.

However, out of curiosity I tried running deplist on the source repo in question github.com/openshift/openshift-velero-plugin (at commit 83f5067), but it did not return any results for the affected dependency (go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp), so it seems like this tool is operating as expected:

openshift-velero-plugin ((83f5067...)) $ deplist . | grep otelgrpc
openshift-velero-plugin ((83f5067...)) $

To compare, I ran the two different methods go list -m all and go list -deps ./... on this repo (plus an -e flag as there seems to be an error resolving one dep). Only the module graph method, (with -m all) shows this false positive dependency, that seems to validate the approach used in deplist, since it doesn't use that option and hence doesn't show this false positive.

openshift-velero-plugin ((83f5067...)) $ go list -e -m all | grep otelgrpc
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.35.0
openshift-velero-plugin ((83f5067...)) $
openshift-velero-plugin ((83f5067...)) $ go list -deps ./... | grep otelgrpc
openshift-velero-plugin ((83f5067...)) $

I'm sorry if I I've misunderstood somehow. Please let me know if I've missed something.

kaovilai commented 1 year ago

Ok I have no idea which tool was used. But something outside my team's control is reporting this false positive and in the past it was mentioned that deplist was the culprit. https://mail.google.com/chat/#chat/space/AAAAFsGEBEs/89257kri3t0

kaovilai commented 1 year ago

Go list -deps still produce less results than go mod parsing would.

go list -deps -f '{{define "M"}}{{.Path}}@{{.Version}}{{end}}{{with .Module}}{{if not .Main}}{{if .Replace}}{{template "M" .Replace}}{{else}}{{template "M" .}}{{end}}{{end}}{{end}}' ./... | sort -u | wc -l
     153
cat go.mod | grep -v  -e "replace" | grep -e " v" | sed 's/\t//' | sort -u | wc -l
     187

Here's a shorter breakdown.

~/git/openshift-velero-plugin CVE-2023-45142
❯ cat go.mod | grep cloud.google
    cloud.google.com/go v0.110.2 // indirect
    cloud.google.com/go/compute v1.20.0 // indirect
    cloud.google.com/go/iam v1.1.0 // indirect
    cloud.google.com/go/storage v1.31.0 // indirect
    cloud.google.com/go/compute/metadata v0.2.3 // indirect

~/git/openshift-velero-plugin CVE-2023-45142
❯ go list -deps ./... | grep cloud.google

~/git/openshift-velero-plugin CVE-2023-45142
❯ 

Notice go list is missing dependencies.

Reference commit to checkout https://github.com/weshayutin/openshift-velero-plugin/commit/e7586a70604705367c11577293308138e92999b5

kaovilai commented 1 year ago

go list remains an inaccurate way to determine build dependency.

kaovilai commented 1 year ago
~/git/deplist main
❯ git branch -vv
* main 6109f6c [upstream/main] Hide ruby output unless using -debug

~/git/deplist main
❯ popd

~/git/openshift-velero-plugin CVE-2023-45142
❯ ../deplist/deplist . | grep cloud.google

~/git/openshift-velero-plugin CVE-2023-45142
❯ cat go.mod | grep cloud.google
    cloud.google.com/go v0.110.2 // indirect
    cloud.google.com/go/compute v1.20.0 // indirect
    cloud.google.com/go/iam v1.1.0 // indirect
    cloud.google.com/go/storage v1.31.0 // indirect
    cloud.google.com/go/compute/metadata v0.2.3 // indirect
kaovilai commented 1 year ago
❯ go mod why cloud.google.com/go/storage
# cloud.google.com/go/storage
github.com/konveyor/openshift-velero-plugin/velero-plugins/common
github.com/kaovilai/udistribution/pkg/image/udistribution
github.com/kaovilai/udistribution/pkg/client
github.com/distribution/distribution/v3/registry/storage/driver/gcs
cloud.google.com/go/storage

clearly cloud.google.com/go/storage is a dependency here and would be one of the folders that shows up and required to build when you run go mod vendor.

Deplist would be missing this.

sfowl commented 1 year ago

Hmm, that is interesting. As best I can tell there seems to be a blind spot here if blank identifiers are used in imports, e.g.

https://github.com/migtools/udistribution/blob/95f5f723a5d7d29b1e0cb003aab0f18cbce3c989/pkg/client/client.go#L19

Not quite sure yet if this is desirable or not, in the context of inspecting source repos for vulnerable packages.

kaovilai commented 1 year ago

I guess it depends on the repo. In this case importing these with blank identifiers still cause init functions such as https://github.com/openshift/docker-distribution/blob/c7be7b49aed9720742b1c03e694248bd83419348/registry/storage/driver/gcs/gcs.go#L84-L86 to run which affects ability to use other functions imported from non blank identifiers.

Definitely included in the build

sfowl commented 1 year ago

Ah, I think I mis-diagnosed above, the issue actually seems to be that go list honours build flags. So they need to be the same as the build flags used during a build to get the same result, e.g.

openshift-velero-plugin ((83f5067...)) $ go list -deps -tags include_gcs,include_oss ./... | grep cloud.google.com/go/storage
cloud.google.com/go/storage/internal
cloud.google.com/go/storage/internal/apiv2/storagepb
cloud.google.com/go/storage/internal/apiv2
cloud.google.com/go/storage

IIRC this was a vaguely known limitation during prototyping, but considered a justifiable tradeoff on balance.

It would be good to address this identifying the correct build flags from the source repo, but I'm not aware of any easy way to do that.

If you have suggestions I'm open to it but bear in mind this tool's only known usage is within a legacy tool that is being replaced by https://github.com/RedHatProductSecurity/component-registry. Component Registry (corgi) relies on syft for Go module/package identification, not deplist, so any changes here wouldn't affect the new replacement tool, corgi.

kaovilai commented 1 year ago

I'm not aware of any easy way to do that.

dependabot.yml or similar perhaps. Repo owner can configure this according to their build.

kaovilai commented 1 year ago

My preference would still be to stop using go list or anything relying on buildtags.

Implementation example https://github.com/snyk/snyk-go-plugin/blob/master/gosrc/resolve-deps.go#L38 snyk here walks through each .go files for dependency imports and are not affected by buildtags.

Or you know, parse go.mod.

kaovilai commented 1 year ago

only known usage is within a legacy tool that is being replaced by https://github.com/RedHatProductSecurity/component-registry. Component Registry (corgi) relies on syft for Go module/package identification, not deplist, so any changes here wouldn't affect the new replacement tool, corgi.

duly noted. guess I'll have to hunt down other tools the alert team is using.

sfowl commented 1 year ago

Or you know, parse go.mod.

By this, do you mean only identifying modules and not identify used packages at all? I.e. prefer coarse grained, over fine grained?

I think only comparing at the module level would lead to much larger number of CVE detections. Package level comparison often allows us to filter out large numbers of what would otherwise be false positives.

kaovilai commented 1 year ago

Ok if you're doing package level already then perhaps walking each go files would be the right improvement.