dependabot / dependabot-core

🤖 Dependabot's core logic for creating update PRs.
https://docs.github.com/en/code-security/dependabot
MIT License
4.74k stars 1.03k forks source link

An entry in go.sum does not represent a dependency #4740

Closed 08d2 closed 1 year ago

08d2 commented 2 years ago

I have just received Dependabot alerts based on entries in a go.sum file in my repository. But an entry in go.sum does not represent a dependency of my project, it represents a module that the Go toolchain had to verify when computing the dependency graph.

jeffwidman commented 2 years ago

Is this a security alert or a PR bumping an outdated package? I think this repo is just for the logic for bumping outdated deps and issues with security alerts are supposed to be reported elsewhere (I have no idea where unfortunately).

That said, we hit the problem with go.sum files listing transitive dependencies which were flagged by the Dependabot security scanner at my day job. We were able to work around it by pruning out the transitive deps from go.sum using go mod tidy -compat=1.17.

As long as you're using 1.17, that is a fast fix for the spurious security alerts. Unfortunately it'll break all your Dependabot PR's because they'll reset go.sum every time so you'll have to run go mod tidy -compat=1.17 on each of them before merging.

Additionally, the -compat=1.17 flag will likely break on go < 1.17. Although it looks like this will be the new default behavior based on how the 1.18 draft release notes are worded... I'm very curious to see how that ends up working with 1.16 code.

jurre commented 2 years ago

Yeah @jeffwidman is right that the alerting logic is not based on the code in dependabot-core here, but I will make sure to point the team that works on alerts at this issue though.

adonovan commented 2 years ago

@08d2, you are correct that that go.mod file does not represent a dependency graph, and that it includes modules that were downloaded only to compute the dependency graph. Specifically, the go.sum file contains two kinds of entries: regular entries, which indicate that the .zip file of the module is needed during the build, and /go.mod-suffixed entries, which indicate that a given module contributes only version constraints to the build. In order to compute an approximation of actual dependencies, GitHub's tools such as Dependabot ignore the second kind of edge, since code in those modules cannot affect the behavior of your program, but all other modules are downloaded during the build and are thus considered potentially relevant to the behavior of your program.

You might argue that the module dependency graph is a crude overapproximation of the set of packages used in the build, and that it gives rise to infeasible dependency paths---and you would be right. However, a package-level dependency analysis would require a totally different approach---and it too would produce infeasible paths at function granularity.

It would be an interesting project to build a Go-specific tool that computed the set of vulnerable dependencies using the finest available dependency information, perhaps based on a call graph. [Update: govulncheck now exists.]

08d2 commented 2 years ago

@jeffwidman

the problem with go.sum files listing transitive dependencies which were flagged by the Dependabot security scanner at my day job.

It's important to be clear, here:go.sum is a checksum file, not a lockfile, not a manifest file. It does not list or represent dependencies at all. A tool like Dependabot should completely ignore go.sum. The dependency graph of a Go module is better captured by the output of go list commands.

08d2 commented 2 years ago

Is this a security alert or a PR bumping an outdated package?

The alerts I saw were part of github.com/org/repo/security/dependabot

adonovan commented 2 years ago

The dependency graph of a Go module is better captured by the output of go list commands.

You're absolutely right, but the architecture of GitHub's dependency computation system assumes that the graph is expressed in a small number of manifest files, so running go list on the actual Go source is out of the question.

08d2 commented 2 years ago

@adonovan

GitHub's dependency computation system assumes that the graph is expressed in a small number of manifest files, so running go list on the actual Go source is out of the question.

So, concretely, GitHub's (apparently opt-out) Dependabot feature created false-positive "security alerts" in my repo, based on some entries in a go.sum file file in a subdirectory somewhere. I think these false positive alerts are really bad and need to be fixed! 😬 And I think it's bad enough that if the only way to fix them is to disable Dependabot for Go altogether, that's preferable to the current state.

adonovan commented 2 years ago

I think these false positive alerts are really bad

I agree that false positives are frustrating, especially if they cannot be easily dismissed.

There are several ways we could resolve this:

  1. you disable Dependabot Alerts for your repo.
  2. we extend the Dependabot Alerts UI to provide a means for users to select which manifest files they care about.
  3. we remove go.sum support from GitHub's dependency computation system (but leave go.mod).

Which approach is best depends on the answer to these questions:

I don't know the answer to either question, but perhaps our product folks do.

[Update: I'm talking about Dependabot Alerts, not the more sophisticated Dependabot core. Thanks @jurre for clarifying.]

jeffwidman commented 2 years ago

IIUC, running go mod tidy under go >= 1.17 creates a go.mod file that lists all the deps required to build the binary, both direct / indirect: https://go.dev/doc/go1.17#go-command

So in that case, it probably makes sense to disable the scanning of the go.sum file when the go.mod file has the line directive go 1.17 (or greater) in it.

08d2 commented 2 years ago

There are several ways we could resolve this:

  1. you disable Dependabot for your repo.
  2. we extend the Dependabot UI to provide a means for users to select which manifest files they care about.
  3. we remove go.sum support from GitHub's dependency computation system (but leave go.mod).

This is confusing to me. As far as I understand, go.sum is, factually, not a manifest file — it has at best a correlative relationship to its module's dependency graph. Is that not true?

08d2 commented 2 years ago

@jeffwidman

IIUC, running go mod tidy under go >= 1.17 creates a go.mod file that lists all the deps required to build the binary, both direct / indirect:

IIUC go.mod file always defines the deps required to build the module, invariant of version.

jeffwidman commented 2 years ago

Yes, but previously it didn't necessarily list all the // indirect deps which were getting pulled in transitively into the final binary.

So previously a security scanner couldn't merely scan what was listed in in go.mod, it had to walk the dep tree. And to do that, it'd have to pull up each modules go.mod file. However, those go.mod files may list other packages that are required by the direct dep, but not actually used by the program being built, for example deps used only for tests of an intermediary in the chain.

Stepping back, from the POV of a security scanner, walking the dep tree often simply wasn't feasible, especially since they'd still struggle to identify which indirect deps made it into the final binary and which were only used by intermediate modules. That's why the scanners used go.sum as a rough proxy instead, since it's a superset of the required dep tree, and about what they'd get anyway if they were to build the tree themselves.

Now with 1.17, those indirect deps that that would be discovered by walking the dep tree, but aren't actually used in the final binary, get pruned out of the list in go.mod. And then the remaining // indirect deps are listed in go.mod (previously they were optional).

So now go.mod can be treated as a definitive source of "here's all the packages that are required in the final binary, w/o having to walk the dep tree".

So this makes the job of a security scanner much easier, since they can directly use go.mod since it will list the indirect deps if-and-only-if they're actually included in the final binary. So it should reduce their workload (no need to scan the extraneous stuff in go.sum), and reduce false positives (stuff listed is guaranteed to be in the final binary).

I should caveat this with the disclaimer that I'm not part of the go team, and this stuff is complicated, ie, I could be mistaken. The best resource I've found was the design doc for the module graph pruning feature that @bcmills put together here: https://github.com/golang/go/issues/36460

jurre commented 2 years ago

Going into possible solution mode here.

So it's important to note that there are different systems in play here, that have different properties:

  1. Dependency Graph: Computes a list of dependencies that a repo depends on, by analyzing files in the repo
  2. Dependabot Alerts: Takes the list of dependencies from the Dependency Graph and compares them to our advisory db to generate alerts
  3. dependabot-core / Dependabot Updates: Gets the alert and advisory data from the dependabot alerts system and attempts to generate a PR that fixes the issue. (There are some systems that orchestrate running dependabot-core, but it's a close enough proxy for what I'm trying to say next)

1 and 2 can't reasonably do any complicated computation given our current architecture, but for 3 we run complicated computation all the time. We actually already run go list as part of our workflow and would bail out trying to generate a PR here when we can't find the given dependency.

We could look into closing the alert when Dependabot Updates attempts to bump the alert that was generated for this dependency but the dependency doesn't appear in the go list command, assuming it was either removed since the alert was generated, or is not actually used in the project at hand.

This still leaves users with some noise to deal with, as they'd get notified about the alert being opened etc, but it might end up being a better experience?

There's a risk here that if we somehow fail to detect the dependency in dependabot-core/dependabot updates, but it does actually exist in the repo, we now close out a valid alert, so we'll need to be careful of we decide to go this route.


Just wanted to amplify that the problem we're describing here is not with dependabot-core. I'll keep the thread open here because the conversation is really interesting and I don't want to kill it by asking everyone to move to a different place.

adonovan commented 2 years ago

I think @08d2 is right that treating a go.sum file as a manifest is not helpful because the set of spurious zip (non-/go.mod) entries in it grows quickly with project size. The issue is that the go.mod file doesn't distinguish production and test-only dependencies, and this problem compounds recursively, so if one of your dependencies has huge tests, your go.sum file will enumerate them (and their dependencies) even though this code isn't relevant to your build.

Until go1.17-style go.mod files are more widespread, we have a choice of basing our graph on go.mod (currently an underapproximation) or go.sum, an overapproximation. (Unfortunately the architecture of DG doesn't allow us to look at both files in conjunction--otherwise we could use go.sum for older versions and go.mod for later versions.) Users were unhappy with the underapproximation leading to missed vulnerability alerts, and now a user is unhappy with the overapproximation.

I tend to think analysis tools should be precise or be silent. Users get justly frustrated by false positives and learn to ignore them. False negatives carry a risk too, but at least users know the tools are serious and act on their advice.

I propose that we disable go.sum support in DG, or at least stop using it as a basis for alerting.

08d2 commented 2 years ago

Thanks for these comments, they're educating!

I concur with @adonovan, false negatives are far preferable to false positives.

@jurre

So it's important to note that there are different systems in play here, that have different properties:

  1. Dependency Graph: Computes a list of dependencies that a repo depends on, by analyzing files in the repo

Do you mean the GitHub Dependency Graph available under the insights tab? I see that go.sum is (incorrectly) used as a source of truth for that feature — maybe this is the underlying problem. Do you know where I can file an issue for that? I poked around a bit but couldn't find the corresponding repo or project...

adonovan commented 2 years ago

Do you mean the GitHub Dependency Graph available under the insights tab? I see that go.sum is (incorrectly) used as a source of truth for that feature — maybe this is the underlying problem. Do you know where I can file an issue for that? I poked around a bit but couldn't find the corresponding repo or project...

Exactly; as @jurre pointed out, DG is the source for Dependabot Alerts. I don't know if it's exposed as a product with its own issue tracker. I will ask internally.

lseppala commented 2 years ago

👋 from the Dependency Graph team. We're aware of this now and coming up with an acceptable short-term solution. The false positive alerts are frustrating; we'll see what we can do to address this.

jurre commented 2 years ago

Do you know where I can file an issue for that? I poked around a bit but couldn't find the corresponding repo or project...

Usually the easiest thing is to go through the support team, as the dependency graph codebase is not publicly available. But given that the team is already engaging in this issue, I think you're good for now 👍

bmhatfield commented 2 years ago

Hi there, was this issue resolved in dependabot?

Based on the open state, I assumed it was not; however, I recently resolved an instance of this advisory: https://github.com/advisories/GHSA-c3h9-896r-86jm

After resolving it, I noticed that dependabot marked it as fixed, even though my go.sum still has this line: github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=

Although this checksum exists in my go.sum, it seems that dependabot is correctly identifying that gogo/protobuf is no longer in my dependency graph.

daenney commented 2 years ago

This issue is still not resolved. I just got a slew of Dependabot alerts across tons of repositories due to a go-yaml release. However, none of the repositories require go.pkg.in/yaml(.v2|v3) directly themselves, they're just part of the go.sum because a dependency might be using it.

tonglil commented 2 years ago

What we're left with is essentially telling users to close all alerts coming from go.sum because it's just so noisy?

jeffwidman commented 2 years ago

After I last commented on this issue I joined the Dependabot team, so I’m now privy to internal discussions on this (and have a personal interest in getting it resolved).

Folks internally here at GitHub are well aware of this issue, and in agreement that it’s frustrating and we’d like to resolve it.

From a technical perspective, the quick solution is to disable go.sum parsing for security alerts. However, we’ve got a significant number of users who are still on versions of go < 1.17 (meaning their go.mod files don’t list all deps), and many would rather have false positives than false negatives. So that ties our hands a bit.

Another option is doing a version check before choosing whether to ingest go.sum, but as alluded to above there’s some technical challenges in our architecture with doing that, and it may not be viable.

We’re still researching the metrics to see how many users would be impacted, as well as the rate of change of how fast they’re moving to go >= 1.17...

At some point we’ll pull the plug on go.sum, we’re just still trying to figure out when the best time to do that will be.

08d2 commented 2 years ago

However, we’ve got a significant number of users who are still on versions of go < 1.17 (meaning their go.mod files don’t list all deps),

The release of Go 1.19 earlier this month signaled the official end-of-life of 1.17 (and below). I understand things are not always so simple, but if the core language team has dropped support for a version, I think it's entirely reasonable for anyone in the ecosystem to do the same.

hedhyw commented 2 years ago
  1. go.mod contains a version of Golang, so it might be a good idea to use it in order to disable parsing of go.sum for 1.17+
  2. Flag in dependabot.yml that can disable parsing of go.sum also works for me.
bcmills commented 2 years ago

@08d2

The release of Go 1.19 earlier this month signaled the official end-of-life of 1.17 (and below).

Note that cmd/go in Go 1.19 still supports go.mod files that declare older go versions, and will continue to do so for the foreseeable future (per Go's compatibility policy).

mvdan commented 2 years ago

Thanks for the update, @jeffwidman. I think what @adonovan wrote in https://github.com/dependabot/dependabot-core/issues/4740#issuecomment-1043169370 is pretty much what I am thinking should be done too, but ultimately I don't know which of your important users prefer false positives over false negatives.

To add to what @bcmills said: Go 1.17.x is already deprecated as a Go versions for developers to use, as Go only backports fixes to the two latest versions per the release maintenance policy. But like Bryan says, this does not mean that Go modules declaring much older versions like go 1.13 will stop working, or even show a warning to the user. In reality, Go developers today aren't rushed to bump this Go version directive in their go.mod files - they only need to if they want to use newer language features like generics, or if they want to opt into new module authorship behavior, like pruned module graphs in Go 1.17.

For the reasons above, plus the Go toolchain's backwards compatibility with older go.mod files, I think we'll continue to see these older go.mod files in numbers for a few years still. Waiting for them to become a minority (like less than 5% of active Go repositories) is one strategy for sure, but I think it may take much longer than you think to happen. So I would personally go with a different strategy to avoid the false positives sooner than later :)

jeffwidman commented 2 years ago

I appreciate everyone chiming in. At my previous company I primarily worked in go and I've made a couple of minor contributions to core go, so I'm familiar with the semantics of go.mod, the deprecation policies, etc. And some of the other folks involved are passionate gophers as well.

Yes, ideally we could only scan go.sum when the go.mod file has a go directive < 1.17, but there's some internal technical hurdles that make that non-trivial.

The good news is we actually pulled the metrics yesterday, and I was pleasantly surprised how many repos were already on 1.18/1.19... the generics release may have encouraged folks to upgrade quicker.

I can't share a whole lot more at this time, but know that we do see this as annoying, and we are looking toward fixing it.

For those looking to leave more comments about how annoying this is, the best thing you can do is 👍 the initial comment. Adding "me too" just spams everyone watching this issue for updates.

tonglil commented 2 years ago

they only need to if they want to use newer language features like generics, or if they want to opt into new module authorship behavior, like pruned module graphs in Go 1.17.

I think this could apply to Go ecosystem changes as well - update your go.mod to get dependabot support.


What is surprising to me is that dependabot can't/doesn't parse the actual dependency files, when it does parse the dependabot.yml file.


Probably the most flexible method is described by @hedhyw:

Flag in dependabot.yml that can disable parsing of go.sum also works for me.

GitHub is entirely in control of this configuration schema. I can already see fields that can be used, including:

08d2 commented 2 years ago

@mvdan @bcmills

This does not mean that Go modules declaring much older versions like go 1.13 will stop working, or even show a warning to the user. In reality, Go developers today aren't rushed to bump this Go version directive in their go.mod files - they only need to if they want to use newer language features like generics, or if they want to opt into new module authorship behavior, like pruned module graphs in Go 1.17.

This is a good point. The version specified in the go.mod is more of a upper bound for language features than it is a toolchain requirement.

dghubble commented 1 year ago

I think where we've arrived is that go.sum does not represent the dependencies of a Go project, it just enumerates a bunch of package checksums. When a CVE is published, it's typically a specific Go module (e.g. golang.org/x/crypto) and the mitigation is to update to a newer module version. Today, you can do that in your go.mod,

# go.mod
require (
-  golang.org/x/crypto v0.5.0
+  golang.org/x/crypto v0.6.0
)

but your go.sum will of course continue to list the checksum of every prior version of a module.

# go.sum
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5 h1:HWj/xjIHfjYU5nVXpTM0s39J9CbLn7Cc5a7IC5rwsMQ=
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.6.0 h1:qfktjS5LUO+fFKeJXZ+ikTRijMmljikvG68fpMMruSc=
golang.org/x/crypto v0.6.0/go.mod h1:OFC/31mSvZgRz0V1QTNCzfAI1aIRzbiufJtkMIlEp58=

Discovering dependencies (incorrectly) by using go.sum will continue to consider such a project vulnerable. Without the proposals above (e.g. stop looking at go.sum, opt-in via package ecosystem gomod vs gosum, opt-in via go version detection), we're left with a few workarounds:

1) Artifically force go mod tidy to remove old checksums from go.sum, to please GitHub tooling. This later makes bumping those dependencies more difficult (is it pinned for an important reason or just for dependabot?). Over time, projects end up having replace directives for any module that's ever had a CVE.

replace golang.org/x/crypto => golang.org/x/crypto v0.6.0

2) Manually dismiss alerts originating from go.sum files 3) Disable dependabot alerts for Go repos or org-wide

courtneycl commented 1 year ago

👋 Hi folks! Dependency graph no longer ingests go.sum files, and any associated Dependabot alerts in your repos have been removed (some may be replaced with alerts on your go.mod file). See the changelog for more details, and do let us know if you have any ongoing issues. Thank you for your participation and continued patience here. ✨

katexochen commented 1 year ago

Could anyone link the PR that fixed this issue?

courtneycl commented 1 year ago

@katexochen Hi! This was resolved in the dependency graph code base, which is not public.