cmd/go: allow package authors to retract older package versions as insecure, incompatible, or broken #24031

Closed michael-schaller closed 4 years ago

In order to achieve reproducible builds vgo keeps using specific package versions until an explicit upgrade is done. IMHO this is an excellent default but I'm worried about insecure package versions as currently vgo can't detect if the build contains an insecure package version.

Can vgo be changed so that a package author is able to specify that every version below X is deemed insecure and if an insecure package version is used during a build that the build will fail (with a flag to override)?

@michael-schaller I'm not sure what new functionality you are asking for.

Right now vgo will not choose a version "below" a version you specify. So if there is an insecure package version, put the minimum version selector in your package or another package and it will not choose it. Maybe For modules that build main packages, you can also specify version ranges to exclude. Maybe I'm missing something?

So if there is an insecure package version, put the minimum version selector in your package or another package and it will not choose it.

This only works if you know that a certain version is insecure. I think what he's asking for is a mechanism for package authors to broadcast to the world that a certain version is insecure; so that every time a user pulls it, they'll be warned that that version is deprecated and they'll know to update their mod file.

@ALTree That makes sense. Thanks for clarifying for me. I think that is a fine question to ask and may go hand-in-hand with the question of how to distribute the content / function of go.modverify.

@ALTree correct. :-)

One naive idea would be that 'vgo build' could check the 'go.mod' (or another machine readable file) of the latest package versions for security information. This would also be great for Continuous Integration as then a package author could notify of security issues via CI build failures that are (hopefully) monitored.

@rsc mentioned Deprecated Versions (as part of the Defining Go Modules article) which is similar to this issue. He proposed to append +deprecated to a version tag which would also be a viable solution for this issue if +insecure would be honored by vgo.

IMHO that would be a pretty bare bones solution though as I presume that people would soon want to extend that further. For an instance I could see that someone would also want +buggy for a version with a serious bug (for an example a serious memory leak) or +broken for a version that is broken under certain circumstances (for an example the Windows build is broken). Furthermore this solution lacks a way to add more context as for deprecated versions one might be interested in the deprecation announcement or timeline and for insecure versions one might be interested in CVE, severity, ... and so on.

That said I think signaling via tags if a version is deprecated, insecure, ... is not adequate. Maybe even the proposal from my previous comment isn't. Maybe the discussion should rather go into the direction of a machine readable changelog which could be managed via vgo release.

I'm not sure about the utility of this feature. If I release version 1.2.3, surely it must be fixing some bugs over 1.2.2 and I would probably mark 1.2.2. In other words, on almost every (point) release, I would mark all older versions as insecure. You might say this is only for bugs with a CVE, but I think the point still stands.

I don't see this providing much over just reporting that newer versions have been published.

@uluyol there are actually multiple points for this feature (which I sadly failed to point out so far):

1) It enables machines to automatically react: It's a chore to check for new versions and humans are sadly error prone, especially in burdensome cases of chore like this one. Having a machine readable form to notify of insecure, deprecated, buggy, ... versions enables machines to react so that humans can spend their time on better things. In this case vgo would be able to detect an unhealthy build and could let the build fail.

2) Cut down reaction time: Once machines are able to react on such issues you can cut down reaction time drastically as for an instance Continuous Integration would fail and ideally you would automatically get a bug report for the build failure with the details why the build failed. This is especially valuable for projects that aren't any longer in the active development phase but rather receive infrequent updates on demand.

3) Indicator for healthiness of new dependencies: If you add a new dependency to your project you get immediately a sense of the healthiness of that new dependency. If the new dependency is unhealthy vgo should fail to build and with that you know that you either need to change your project, need to file a bug report against the new dependency or you need to fork and fix the new dependency (if you still want to use it). For an instance this would catch the error that a human tries to add the new dependency with the major version v1 albeit that version is deprecated and a newer major version should be used.

4) Indicator for healthiness of indirect dependencies: In the end you should have a cascading effect in case a package version is marked insecure as every other package that directly and most importantly indirectly depends on it should fail to build.

I see. I agree that tags lack content, and that it would be useful to also have a reason (or changeling) listed for why a version is insecure. Then on build you could get messages like

build failed: insecure packages oauth2: CVE-X-Y vulnerable to DoS zstd: crash on invalid input

I can see why this would be useful. I do think that we'd want to let people override this behavior though.

A definite +1 on the letting people override this behavior part. :-)

I agree that additional in-band signalling would be useful way to let the maintainer know that action may need to be taken—but I disagree that every situation requires automatic, mechanical action.

Security fixes are important but automatically applying them can be as fraught as always ignoring them.

If the module in question is for a non-opensourced essential service using foo v1.0.5 and there's a foo v1.1.4+security that needs to be immediately investigated by a person. However, if its fixing a vulnerability introduced in foo v1.1.0 it may not necessarily be worth the effort and risk to drop everything and upgrade right now.

I would prefer if vgo would continue to work the way it does today, with another tool, perhaps vgo audit that could check online if the versions used have problems.

The idea is that this tool is run on-demand by the programmer, rather than automatically. If this tool is easy to use, vgo audit could become as natural as running go fmt.

The purpose of vgo is to add versions to the vocabulary of the toolchain, so that users and tools can talk to each other sensibly about versions. As I mentioned in the https://research.swtch.com/vgo-module article, I think it would make sense to have a v1.2.3+deprecated tag, using an annotated tag so that there's a commit message. The commit message can say anything it wants about why the release is deprecated, and we can show that to users. We could easily add a notation in the text for identifying security problems. What happens next is up to tools. Probably vgo list -m -u (tell me about pending module updates) would do well to show information about currently-used modules that have deprecation notices.

I've been thinking a bit about where to write down this information. The magic extra tag is clearly too limited in what it can record. I looked briefly into finding a way to write more information, such as using an annotated tag's commit message in Git, using svn propset to record a special per-revision property in Subversion, and so on. But something that must be reinvented for every different version control system is a bad idea.

Of course, we can't write the information in the original module version's go.mod, since we didn't know it was insecure when we tagged it, and the file tree is by convention (and enforcement via go.sum) immutable after tagging.

But maybe we can record it in a go.mod in a later release of the same module. Specifically, we could say that to look for updated post-release metadata about a particular module we grab the latest version's go.mod and look there. So for example, suppose v1.1.2 has a security problem, it was fixed in a rewrite for v1.2.0, and we're up to v1.2.4 when we discover the problem. Then we'd issue a v1.2.5 that is just v1.2.4 with an updated go.mod that adds something like:

bug example.com/mymodule/rpc v1.1.2-v1.2.0 https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"

The fields are "bug", the affected package (if you don't use this package you don't have the bug), the half-open version range when the bug existed, a URL with more information, and a short description. Maybe a security bug would conventionally begin with a "security: " prefix in the description.

Then any future "go get", even one not asked about that module, would look up the latest version, find v1.2.5, learn about the bug in v1.1.2, and print a warning. Also, we could make this information available to running programs, which could inspect their own binaries for the package and version and then self-diagnose on a server status page, automatically report to local monitoring systems, and so on. (We've done something like this inside Google since early 2013 and it works really well.)

If we later decided to issue a v1.1.3 with that fix, we could issue a v1.2.6 that only updates go.mod:

bug example.com/mymodule/rpc v1.1.2-v1.1.3 https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"

If we wanted to warn people about the bug but didn't have time to fix it yet, or the bug has been there from the beginning, the half-open interval can drop either side:

bug example.com/mymodule/rpc v1.1.2- https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"
bug example.com/mymodule/rpc -v1.1.3 https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"
bug example.com/mymodule/rpc - https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"

The same general idea could apply to marking earlier versions deprecated or to reporting known conflicts with other dependencies.

It's slightly awkward to be issuing go.mod-update-only patch releases, but doing so creates a history of the annotations and makes them available via module proxies without special arrangement.

All of this is still sketchy but the above seems like it's on a better path than just the +deprecated tags.

@rsc for some clarification, when should the bug directive be used? Only when there are security issues? Security issues plus deprecation? If deprecation, is there a common test people should us for marking something as such (e.g., every patch release)? Every bug?

Note, for every bug I poked at a couple repos I've had to deal with:

the go-grpc repo has 76 closed bugs
kubernetes/kubernetes has 4,109 closed bugs
gin-gonic/gin as 28 closed bugs

I tried to look at a sampling of typical and worse case scenarios.

@mattfarina If I have followed, I believe @rsc has referred (e.g., here and elsewhere) to this particular issue #24031 as potentially also being part of the solution for recording pair-wise incompatibility post publishing:

as I noted in #24301 on 5/25, I do want to find a place to record incompatibilities, but one that allows recording them after the fact instead of requiring accurate prediction of the future.

and:

More generally we need a place to record other known problems with already-released versions, like security bugs. That's #24031. Maybe the answer there applies to incompatibilities too.

And in https://github.com/golang/go/issues/24031#issuecomment-407798552, towards the end of that more recent comment (which was mostly using security or a general bug as an example), Russ also added:

The same general idea could apply to marking earlier versions deprecated or to reporting known conflicts with other dependencies.

That said, the one-liner here is currently:

"cmd/go: allow package authors to mark older package versions as insecure"

If the intent of this particular issue #24031 is broader than security, it might make sense to update the one-liner to help people know where to discuss which topic (vs. maybe #26829 is the better place to discuss recording incompatibilities, or ___).

@mattfarina Said another way, using the start of your example from #26829:

Consider the following case where you have an app with a dependency on a module (modA) that has a dependency on another module (modB).

App --> modA --> modB

modB is released with a bug.

My understanding of the discussion of #24031 (both in the issue here and outside of github) is that #24031 might be the answer for:

the author of modA declares certain versions of modA do not want to use the buggy v.1.2.3 of modB
the author of "App" becoming aware of the actions of the author of modA
all of that happening without requiring any action on the part of the author of modB

Mechanism still TBD, but perhaps the mechanism (if I've followed the discussion) might be something like:

the author of modA produces a new go.mod in a later release of modA that declares that certain versions of modA are incompatible with the buggy v1.2.3 of modB
the author of 'App' issues a 'go get' (where perhaps that 'go get' is not directly related to modA or modB, or perhaps a different go command), and the author of 'App' is warned about the pair-wise incompatibility between the versions of modA and modB now in use in 'App' (or perhaps it is something other than a warning).
and all of that is done without needing to predict the future (by declaring future incompatibilities that don't exist yet at the moment of publishing a release) and also without needing to update an immutable published release.

But sorry if this is off base or just noise... and I agree some clarity on what this particular issue is intended to cover would be helpful, because it is an important set of topics...

@myitcv mentioned this issue. I put forward #29537 which i think would go some way to solving this issue

Since my proposal was closed to merge into this thread, I'll put a slightly cut down version here:

Abstract

My proposal is to introduce two git tag forms, which look like: vulnerability CVE-2018-0001 in v1.0.1 for findings and fix vulnerability CVE-2018-0001 in v1.0.2 for fixes. Packages that do not have versions work via go.mod versions: vulnerability CVE-2018-0001 in v0.0.0-20180517173623-c85619274f5d.

I think this would be a huge boon to Go security overall. I expect my proposal won't be perfect and I welcome feedback.

Proposal

See also: go mod version definition. NB: as go.mod versions do not require annotated semver version tags, this system also works with repos that only have refs.

tag: `vulnerability [identifier] in [go mod version]`

Used to mark a go mod version as having some security bug. The identifier is expected to be unique -- it could, for example be a CVE ID or a JIRA ticket ID.

The tag is expected to be on the first ref known to be vulnerable.

tag: `fix vulnerability [identifier] in [go mod version]`

Used to mark a go mod version as not vulnerable to a security bug. The unique identifier is said to be fixed.

The tag is expected to be on the first ref known to introduce a fix.

Determining if a version is vulnerable

Enumerate all fix vulnerability tags that are before the go mod version, collect their identifiers.
Enumerate all vulnerability tags that are before the go mod version and whose identifiers are not in the fix vulnerability set.
If the resulting set is not empty, it is the set of vulnerabilities affecting this version.

Compatibility with Bazaar

Since Bazaar doesn't support whitespace in tags instead opting for dashes, I propose that dashes - are used interchangably with the space delimiter " ".

EBNF sketch

space = " " | "-"

nonspacing = ? unicode categories: Letter, Number, Mark, Punctuation, Symbol ?

identifier = { nonspacing }

vulnerability_proposal_tag = vulnerability_tag | vulnerability_fix_tag

vulnerability_tag = "vulnerability", space, identifier, space, "in", space, go_mod_version
vulnerability_fix_tag = "fix vulnerability" space, identifier, space, "in", space, go_mod_version

Other ideas I'd like feedback on

using tag annotations instead of whole tags, as per @v3n's comment. This potentially means old tags can be 'marked' as vulnerable.
optional [HIGH] [LOW] etc CVSS score annotations?

see also: reasons this is a good idea vs go.mod

rsc's comment above suggests putting this information in the go.mod file, what advantage do you see in putting this in VCS tags?

The go.mod file seems to have the advantage that it doesn't need to be reimplemented per-VCS, and it'll work when the client is retrieving a package over HTTPS (aka a "mirror" in the diagram on https://blog.golang.org/modules2019).

the two major points against using repo metadata like tags rcs mentions are (1) the tag is too limited and (2) it has to be re-implemented across VCSs. Here's some answers to that and some extra points:

1. I think it is significantly more expected for tags to contain metadata, rather than files in the 'latest' revision

I do like how the go.mod approach naturally has an audit trail and history. That said, I think this approach falls further out of line with what is expected of version control systems than the tag approach. I did a little research, and there appear to be few restrictions on what can be tagged across the VCSs Go supports.

In terms of meeting expectations of technology, I think it's quite odd to have a file that contains cross-version metadata on the repo that is itself a revision of the repo versus tags which are created for the express purpose of expressing such metadata.

2. The noted issues on the limitations of tags don't seem to be as bad as thought

Bazaar is the one VCS that has issues – with white space – because canonically whitespace is replaced with "-" for ease of manipulation. As I propose, I think this is simply a case of allowing "-" to be interchangeable with " ".

3. I can't find any evidence that tracking tag history is a problem in any major or supported VCS

When it comes to the history of tags, mercurial commits these to the history via a file, git has these via tag annotations, bazaar collects this information by default and SVN considers them the same as branches. I don't see any issues here with collecting version information, and I think the precedent here is that VCSs will support this.

4. a special committed file is significantly more difficult to implement in pre-existing systems that would make use of this information

Where I work, there's a third-party import tool that works – as I think many others would also – by acquiring tag metadata and cloning the repo at a set of given tags. Using tags to indicate vulnerability makes acquiring this data a case of reading the information already on hand.

Using a file committed to the 'latest version' requires the tool to understand first determine what the 'latest version' might be, download its contents, load the module file and parse the module format.

If the third-party import tool is language agnositic, it's not necessarily going to clone the latest version of the repo. In this case, downstream services looking at the repo and its tags won't be able to see any information on whether their specific version is vulnerable.

5. with tags, it's feasible to determine the vulnerability of forks or other derivative works

Version information is repo specific, while tags are universally pinned to a specific part of the repo's history, regardless of where that code ends up.

In cases where a repo is forked, merged or otherwise ends up in a state where previous version indicators no longer apply, a system based purely on module versioning would make it extremely difficult to determine if the security bug affects this fork. Using the module file system, a mapping between the versions of repo A and repo B, its fork would need to be maintained in order to determine if the fork is affected by security bugs in the repo it descends from.

In the case of using tags, it can be determined if a version of a fork contains its parents security bug simply by checking if the history tagged with the bug exists in the fork. Correspondingly, it's possible to detect if the issue was already addressed by checking if the fix for the bug also exists in the history of the fork.

It's important to note the system I propose doesn't solely rely on this kind of data, it's simply a side-effect of using version control systems' existing tag systems.

6. Creating new histories to mark versions as vulnerable seems like a nightmare when it comes to derivative repositories

It's common for software companies to discover security issues and attempt to remediate them before going public with the information. This information might also be shared under embargo with third parties.

In this case, adding or modifying the go.mod file would create many separate histories that might be incompatible, especially if internally a non-'latest' revision is being used. If v1.1.2 is being used internally, and v1.1.3 is created to mark a revision as vulnerable, it's going to be really quite difficult to resolve differences with the public upstream repo. It might even require architectural changes to modify frozen repos to mark them as vulnerable.

Ideally, it should be possible for internal maintainers of packages to mark revisions of the package as vulnerable without modifying the history in a way that causes potential incompatibility of the upstream or requires changes to code that would normally require an unfrozen repo.

7. Older packages or systems may not have support for go modules

It's common for very old systems to have security bugs found through better, more modern techniques. Introducing go.mod in these cases necessitates moving all its dependencies to the Go Module format which may be nontrivial. In some cases, semver versions might need to be introduced for the first time so that systems can now detect whatever the 'latest' revision is. Many systems, like x/go/loader don't support Go modules and as such, pipelines may be broken by making this change.

I think it's very important that we allow package maintainers to be able to tag vulnerabilties without potentially breaking the build.

8. The module proxy protocol already includes a metadata structure containing similar information

It's come up a few times that this system would need to be compatible with the module proxy protocol. The proxy protocol already includes an 'Info' struct containing repo metadata such as the semver version tag my proposal is derived from.

I'd argue that this is a much better and more convenient place to contain vulnerability metadata than in the 'latest version' of the repo. Otherwise, as mentioned before you expect the downstream systems to acquire the latest version of the repo regardless of which version they're interested in, download its associated file and parse the metadata out of go.mod.

I think from a Go perspective it makes the most sense to put this in go.mod. I had always in mind that this information should be used by humans and bots as well and so on commit a bot can use the latest go.mod to update tags on the respective VCS. This is especially important as tags look and are handled quite differently across projects and so I don't think that all projects would be happy if their VCS tags would be polluted with this information. IMHO project owners should make a conscious decision about this. On the other side go.mod is just another file and shouldn't pollute any more than let's say adding .gitignore. Furthermore most of the time this information will be updated by developers and developers typically know by heart how to edit files for their respective VCS. This isn't necessarily true for tags as some developers don't use them at all.

I think @Zemnmez has valid concerns but I don't know how much the Go team and community should care. Go modules are a major shift and that is bound to break some existing custom solutions. Trying to ease that shift for public and commonly used alternatives is IMHO a must but breaking non-public company-internal solutions is IMHO fair game as companies should be able to make the resources to adapt if needed and they choose to be non-public with their custom solution and so it is their responsibility to catch up and stay in the game. As I'm working for a large company I know this pain all to well but as with any open source work this first needs to be solved sufficiently in public so that it doesn't block company-internal adoption and I don't think that usinggo.mod blocks company-internal adoption.

Last but not least this feature request was solely intended to make this kind of information available in a public, consistent and human and machine readable format and to let it be used by the Go tooling so that humans can be notified in case something is less than desirable during a Go build. How the whole security ecosystem will react to the availability of this information is IMHO not in scope for this discussion as first of all the Go community/ecosystem needs to be happy with it. That said I think it is a must that this information is easily accessible by other tools (like security audit tools) and IMHO go.mod is easy enough.

@michael-schaller I've been thinking about how I can respond to this comment for a while, and at its core I can't in good faith take you up on any of your points without you at least attempting to address any of my seven points beyond your saying:

I think @Zemnmez has valid concerns but I don't know how much the Go team and community should care. [...]

I'm sure you can see how important it is to have this problem solved in a meaningful way, and how handwaving away real comments on the specifics of the solution because it wasn't the intention of your request to support the wider Go community doesn't help us get there.

If my comments really are so far-flung from the needs of the Go community, please address how individually. I am sure there's at least one point of my seven which is aligned with other members of the Go community, considering the highly positive reaction on my original proposal.

@Zemnmez I think you misunderstood me. IMHO the (abstract) ideas, needs and concerns behind your seven points are important. You didn't make that the primary focus of your seven points though and instead it feels like you presented an implementation proposal that fits your needs and added justifications why this implementation makes sense to you.

As an example I wholeheartedly agree with you that there is the need for an audit trail. I disagree that this information needs to be stored in VCS tags but I might be wrong on that and could be convinced otherwise. What I would like to see is that people clearly communicate their (abstract) ideas, needs and concerns so that abstract goals and non-goals can be formed. Based on that information brainstorming can happen on how these goals can be solved with respective pros/cons. Then a conscious decision can be made on what fits/works best for Go...

If anything then I think that our comments show that it is too early for brainstorming on specific problems and that we first of all need a set of goals/non-goals. Once we all agree on that we can start brainstorming with the goals/non-goals as common ground...

@michael-schaller

it feels like you presented an implementation proposal that fits your needs and added justifications why this implementation makes sense to you

It goes without saying that any implementation proposal is going to fit the proposer's needs. I don't see how / why the justifications for my proposal would't as a result directly be those that make sense to the proposer.

I disagree that this information needs to be stored in VCS tags but I might be wrong on that and could be convinced otherwise

This is what I have difficulty with. I explicitly presented those points as a summary of why VCS tags are a better approach than the other approach presented and the position continues to be held that they're not a good approach without any response on those points. All I'm asking is that if you think my points aren't compelling, you present reasons why so I can understand your point of view and dispel any misunderstanding.

What I would like to see is that people clearly communicate their (abstract) ideas, needs and concerns so that abstract goals and non-goals can be formed. Based on that information brainstorming can happen on how these goals can be solved with respective pros/cons. Then a conscious decision can be made on what fits/works best for Go...

I might be wrong here, but I feel like between your thread and my thread there's a clear communication of intent in the subtext. We'd like a (1) auditable and (2) functional way of marking package revisions as vulnerable.

The arguments I'm making for VCS tags are in two groups: (1) problems with the alternative approach and (2) benefits of the VCS approach.

I think it goes without saying that the best approach will be chosen on the basis of benefits it brings with it beyond minimum outcome requirements, and I don't think that means that such auxiliary benefits have to be litigated as 'requirements' for such a solution. If that were not the case, it would be impossible to decide between valid approaches because by definition they'd all have exactly the same value as measured against the requirements. This is something you state implicitly yourself:

Based on that information brainstorming can happen on how these goals can be solved with respective pros/cons. Then a conscious decision can be made on what fits/works best for Go...

If anything then I think that our comments show that it is too early for brainstorming on specific problems and that we first of all need a set of goals/non-goals. Once we all agree on that we can start brainstorming with the goals/non-goals as common ground...

I think requiring consensus on every primary and secondary goal without a solution in mind is going to take us to bikeshed town and I don't like it there very much anymore. It's my opinion that based on some basic, easy to agree on criteria as previously proposed we should be making proposals and judging them on their relative merits instead of trying to theorise on what success looks like in a completely abstract space.

When it comes to the history of tags, mercurial commits these to the history via a file, git has these via tag annotations,

@Zemnmez Git annotated tag (not "tag annotations") are comparable to commit messages, and have nothing to do with tracking history. Git tags have no history, and are not intended to ever change.

Git annotated tag (not "tag annotations") are comparable to commit messages, and have nothing to do with tracking history

are you saying commit messages have nothing to do with tracking history

are you saying commit messages have nothing to do with tracking history

Individual git commits have no history (they can point to their parents, but that's something else). So in that sense, yes.

@bcmills @jayconrod I think I saw it stated elsewhere that the core Go team's thinking was that prior to tackling this issue, it was more important to first design and deliver the initial version of what ultimately became https://sum.golang.org (given the security benefits that delivers, as well as due to possible interplay between the GOSUM and GOPROXY designs with the design for this issue here).

Two related questions:

Q1. Now that https://sum.golang.org is in beta and slated for Go 1.13, is this issue exiting its "wait" state?

Q2. The design sketched here roughly a year ago by Russ above in https://github.com/golang/go/issues/24031#issuecomment-407798552 I think makes the point that the solution outlined there would also tackle incompatible dependencies and deprecated versions, in addition to security bugs.

(An example benefit of handling incompatibility as outlined there: given a sample dependency chain like A -> B -> C, if C has a bug that impacts B, then B's author could declare incompatibilities with certain versions of C, and the top-level consumer A would be notified when doing an unrelated command, and that could all take place without any action by C's author... which certainly has some nice properties).

The question then is -- setting aside for the moment the exact mechanism, if this issue is tackling more than just security, does it make sense to update the issue title here to be broader than just security?

In any event, it would make sense if more design is needed here, but wanted to briefly check on this issue given the progress on https://sum.golang.org and friends.

Q1. Now that https://sum.golang.org is in beta and slated for Go 1.13, is this issue exiting its "wait" state?

I'm not sure if this was blocked by sum.golang.org specifically. I don't think this is blocked for 1.14. However, we need to agree on a design with respect to any go.mod changes and the proxy protocol.

Q2. The design sketched here roughly a year ago by Russ above in #24031 (comment) I think makes the point that the solution outlined there would also tackle incompatible dependencies and deprecated versions, in addition to security bugs.

The question then is -- setting aside for the moment the exact mechanism, if this issue is tackling more than just security, does it make sense to update the issue title here to be broader than just security?

Yes, I think this is about more than security, and I'll update the title to reflect this. In general, I think this is about a way for module authors to indicate that a version is "bad" and should not be used for whatever reason. Could be security, incompatibility, accidental errors, etc.

The things we need to decide are at least 1) how do authors mark those versions, 2) what does go get do with that in direct mode, 3) what should proxies do.

I'm not sure if this was blocked by sum.golang.org specifically.

I think the older comments I saw might have been partially about trade-offs between this issue here vs. the amount of engineering time needed to design and build sum.golang.org and proxy.golang.org, and then also that the exact approach for this issue might have an interplay with some of the details of GOPROXY, index.golang.org, godoc.org, and sum.golang.org, and I thought I saw a desire expressed to make progress on those other bigger pieces first... But I'll confess I'm not 100% sure, including because it was a while ago.

In any event, thanks for the reply.

One natural extension of the designs proposed here (whether VCS or go.mod based) would be the ability for a module proxy (e.g. the official one) to present a UI of all known vulnerabilities. That is to say, you could get something equivalent to https://rustsec.org/advisories/ on index.golang.org without the need for any actual centralization.

I imagine that in some cases, we may want to not mark all previous versions as bad, only specific releases or ranges of releases... (imagine a 0.9.x release train and 0.8.x release train where you want to mark a set of releases bad but not all the 0.8 releases while marking the 0.9.1 release bad...)

@lizthegrey As far as I understand, that’s part of the intent of the sketch in the comment above in https://github.com/golang/go/issues/24031#issuecomment-407798552

One of the examples from there shows a range:

bug example.com/mymodule/rpc v1.1.2-v1.1.3 https://example.com/issue1234 "RPC client bug - can get stuck if too many servers restart"

Is there any update about this Issue/ Proposal?

@rsc what would be the next steps to continue with this feature request? (I would also be interested to work on this as a 20% project at Google.)

We were forced to work around this by renaming the module.

@jayconrod is working on a design for Go 1.15.

@lizthegrey, could you explain how renaming works around this issue? The connection is not obvious to me.

we accidentally pushed a v1.0.0 of a module, it was cached by the Go proxy, and now even deleting the tag won't prevent it from being automatically fetched when someone fetches that package name.

This is the design I've been working on. Please let me know what you think. I'd like to land this in Go 1.15.

This is a long thread with some contradictory suggestions. I think I've taken everything into account here, but I'm not going to respond to previous comments point-by-point.

Abstract

Module authors need a way to indicate that published module versions should not be used. There are a number of reasons this may be needed:

A severe security vulnerability has been identified.
A severe incompatibility or bug was discovered.
The version was published accidentally or prematurely. (#34189)

Authors can't simply delete version tags, since they remain available on module proxies. If an author were able to delete a version from all proxies, it would break downstream users that depend on that version.

Authors also can't change versions after they're published. go.sum files and the checksum database verify that published versions never change.

Instead, authors should be able to retract module versions. A retracted module version is a version with an explicit declaration from the module author that it should not be used. (The word retract is borrowed from academic literature: a retracted research paper is still available, but it has problems and should not be the basis of future work).

Retracted versions should remain available in module proxies and origin repositories. Builds that depend on retracted versions should continue to work. However, users should be notified when they depend on a retracted version (either directly or indirectly). It should also be difficult to unintentionally upgrade to a retracted version.

Mechanism for retracting versions

(This is largely based on @rsc's earlier comment).

A module author could retract a version of a module by adding a retract directive to the go.mod file. The retract directive simply lists retracted versions. A retract directive should have a comment documenting why the retraction was necessary. Multiple retract directives can be grouped together in a block.

Example:

// Doesn't always encrypt data.
retract v1.2.3

retract (
    v1.4.5-beta          // accidentally required private module
    [v1.5.0, v1.5.9]     // broken go.mod file
)

A range of versions could be expressed using an open, closed, or half-open range:

retract (v1.0.0, v1.1.0)
retract [v1.2.0, v1.3.0]
retract [v1.4.0, v1.5.0)

NOTE: As of Go 1.14, the go.mod parser will interpret parens in the middle of a line as the beginning or end of a block, even for unknown directives in remote go.mod files. Consequently, open and half-open intervals like those above won't be initially supported. The Go 1.15 go.mod parser will be forward compatible with this syntax though.

retract directives in the go.mod file from the latest version of a module (according to the current semantics of @latest) are the only directives that apply. So in order to retract a module version, the module's author must publish a higher new version. If the new version should not be considered for upgrades, it may also retract its own version.

Changes in go command behavior

Build commands

No change would be made to build commands that don't resolve new module versions. A core promise of modules is that builds are reproducible: if a go.mod file lists all the required modules to build a set of packages, the go command continued to build those packages with the same set of module versions, even if versions are retracted or new versions are published.

Commands that resolve the "latest" version

Commands that resolve the "latest" version of a module would ignore retracted versions. For example, if go build is run on a package P that imports a package Q not covered by any module in the build list, go build will find the latest version of a module that provides Q. The resolved module version will be added to the go.mod file.

Currently, the go command lists all versions of a module, then picks the highest release version. If no release version is available, the go command picks the highest pre-release version. If no pre-release version is available, the go command either requests a version from the proxy's $module/@latest endpoint or derives a pseudo-version from the module repository's default branch (in direct mode).

With this change, the go command would load retract directives from the go.mod file for the latest version (according to the current logic). It would then remove retracted versions from the version list and choose a new "latest" version. Retractions listed in the new "latest" version are ignored — only the retractions in the actual latest version apply.

For example, suppose a module has versions v1.2.0, v1.2.1, and v1.3.0-pre. The go command would load the go.mod file for v1.2.1. Suppose that file retracts versions v1.2.1 and v1.3.0-pre. v1.2.0 would now be considered "latest" for the purpose of upgrades.

go list -m

go list -m -u will find retracted versions and will print whether the currently required version of a module is retracted. go list -m -u all will be a convenient way to check if any module version in the build list is retracted.

Example:

$ go list -m -u example.com/m
example.com/m v1.2.3 (retracted) [v1.2.5]

go list -m -versions will find retracted versions and will omit them from each module's version list, as described above.

A new flag, -retracted, will tell go list -m to include retracted versions in the list printed by go list -m -versions and to consider retracted versions when resolving version queries like @latest.

go list -m -json with the -u or -versions flag will show a Retracted field for retracted module versions. The field will contain a list of strings with the rationale comments from the corresponding retract directives if there were a comments. The list will typically have one element, but if a version is mentioned multiple times in retract directives, the list will have multiple elements. The Retracted field will not be shown in go list -m -json output unless either the -retracted or -u flag is used. This avoids the need to load the version list for common cases, which requires a network fetch.

Example:

$ go list -m -u -json example.com/m
{
  "Path": "example.com/m",
  "Version": "v1.2.3",
  "Retracted": ["accidentally required private module"],
  "Update": {
    "Path": "example.com/m",
    "Version": "v1.2.5",
    "Time": "2020-03-10T12:34:56Z",
  },
  ...
}

go get

If an argument to go get has the @upgrade, @latest, or @patch suffix or has no suffix (defaulting to @upgrade), go get will find retracted versions and will not consider them as possible upgrades. go get will also not consider retracted versions when upgrading modules covered by -u.

After go get has finished modifying the build list and go.mod, it will warn if any module versions in the build list are retracted. This warning will only be shown if go get is run within a module; when go get is run outside a module (usually when installing a tool), there's no simple way for users to avoid retracted versions.

Users can upgrade a specific module away from a retracted version by running go get example.com/mod or go get example.com/mod@patch. These commands will not downgrade a module if the latest non-retracted version is lower than the currently required version. If a downgrade is safe, go get example.com/mod@latest may be a better command to use.

Compatibility with old versions

The go command rejects unrecognized directives in the main module's go.mod file. This means that if a module has a retract directive, it must be developed using Go 1.15 or higher (assuming this functionality lands in Go 1.15).

The go command ignores unrecognized directives in other modules' go.mod files. This means it's safe to depend on a module that has a retract directive.

Alternative considered: +retract tags

Instead of using a +retract directive as described above, an author could publish an additional tag to indicate that a version should be retracted. The tag would consist of a version to retract, followed by a +retract suffix. For example, if an author wanted to retract v1.2.3 of the module in the foo/ subdirectory, they would create the tag foo/v1.2.3+retract.

This mechanism has some advantages over the retract directive.

Pseudo-versions and +incompatible versions could be retracted.
It would be possible to retract versions without migrating to modules.
It would not be necessary to publish a new version to retract a previous version.

There are some drawbacks:

There's no history about when versions were retracted (or un-retracted) and who made the change.
It's not clear how to present retracted tags in the GOPROXY protocol. The most logical place is the $module/@v/$version.info endpoint, which exposes JSON metadata about the commit that a version corresponds to. However:
- The set of version control tags, the version list (the $module/@v/list endpoint in the GOPROXY protocol), and the version info ($module/@v/$version.info) are not authenticated by go.sum files or the checksum database. Their contents may change over time, and they can be manipulated by a malicious proxy.
- Some proxy implementations cache .info files for canonical versions indefinitely, since they don't currently contain volatile information. Defining a new endpoint might be feasible, but that would be an even more invasive change.
It's not clear how to express retraction rationale as part of a tag. It could be part of the commit message the tag points to, or it could be part of an annotated tag message (git only). Each version control system has a different idea of what a tag actually is, so we might need a slightly different mechanism for each system, which would be confusing and hard to maintain.
It's difficult for a user to suggest a retraction. With go.mod, a user (or perhaps a bot that detects upstream vulnerabilities) could send a pull request adding a retraction.

Since Go 1.14, the ability to retract +incompatible versions is less important. When the go command searches for the "latest" version of a module, it will not consider +incompatible versions if a go.mod file is present at the highest non-incompatible version of a module.

The ability to retract pseudo-versions and to retract versions without migrating to modules are only temporary advantages. We expect that in a few years, nearly all Go projects will be using modules and will be tagging semantic version releases.

The ability to retract a previous version without publishing a new version does seem attractive. However, given the disadvantages above, it doesn't seem decisive. A new version containing retractions could itself be retracted. The retractions would still be effective, but the new version would not be considered the "latest" version for the purpose of upgrades.

NOTE: This proposal does not add any mechanism to retroactively un-publish a version. This may be useful in an organization where authors control code and everything that depends on it. There a few ways to accomplish this outside of go.mod already: removing a version from the corporate proxy; running a test in CI that fails if a bad version is used; checking versions used by binaries running in production and alerting operators.

We should be extremely cautious about adding a mechanism like this for the whole ecosystem though. We'd open ourselves up to a "leftpad" scenario, where an author (perhaps someone who's taken over an abandoned project) could break builds for everyone by retracting all versions. That attack still exists here, but it's annoying rather than fatal.

Edits:

2020-03-09: added note about why this won't break builds. 2020-03-10: added JSON output example, version ranges, and mentioned that go.mod retractions could be suggested via PR. 2020-05-01: updated version interval syntax, added -retracted flag.

Hi Jay,

Thanks for running with this! I'm very excited to see forward movement here.

Can you speak to why you went with a comment for explaining the cause of the retraction? Specifically as opposed to some retraction-specific syntax.

The thrust behind my question is that for security purposes, it'd be very useful to have an official syntax for saying "retracted for security reasons", so tools can properly notify you if you depend on something with a security issue. Using a comment allows greater variation in what people to do, for better and for ill. I think having a "retraction type" field, which was required, would go a long way to ensuring these things are clear.

Loving the self-retract mechanism of publishing a retraction of an accidentally pushed new major version.

I like the simplicity of the retraction mechanism. My immediate thought was that by not having metadata on the nature of the retraction like other security tagging systems it'd make it more difficult to make decisions based on the metadata, but I think the result of this is more nuanced: with e.g. npm audit, the extra presented metadata gives the end user, sure, the ability to decide if the security advisory applies to them, but it also, I think encourages the ideology in development that 'retractions' (copying the language here) by the developers should be inherently optional, except where they're considered serious by the users.

Though the retraction mechanism doesn't discuss it (as far as I have read) it seems like some more thought needs to be put into the specifics of how a package vulnerable via a critically vulnerable dependency has its dependencies upgraded. If no changes are made to build, how will the developer building the package know there is a critical dependency vulnerability? It is not so common for Go developers in my experience to unreservedly issue a command to upgrade all dependencies.

Lastly, though I'm not sure whether I think this is good or bad, this particular form doesn't keep information on serious but non-critical vulnerabilities that might affect users (such as an API form that commonly results in vulnerability) and doesn't appear to allow specifying a range of versions to which one particular issue applies (which is useful, for example for upstream patching on forks). It's important to ensure that critically vulnerable issues get mitigated in API users, but it's also important, I think that we at least have the ability to surface actionable security guidance where we can.

A few comments:

General

I very much like the overall approach, the name retract and the location in go.mod. Well done! :-D
I miss a section on how to ensure extensibility. How could this be extended without breaking old Go versions?

Abstract

Can we add 'dead/abandoned/end-of-life' as possible reason as well?

Mechanism for retracting versions

Can we add a range of versions, including open start or open end? It can be bothersome for a frequently released package to list all the affected versions and the resulting retractions could be hard to read by humans.
Can we have something slightly better than comments to denote the retraction reason? It would be particularly good for vulnerability scanners to be able to determine how urgent a vulnerability is. Furthermore a grave bug, like a rarely occurring data corruption issue, could be urgent and should be reflected. So I'd like to see at least some machine readable reason enum and urgency/severity enum as part of the retraction definition.

go vet

I would like to see that go vet checks for retracted versions. Many people run go vet before commit and so this would IMHO be a good place to check for retracted versions.

No change would be made to build commands that don't resolve new module versions. A core promise of modules is that builds are reproducible: if a go.mod file lists all the required modules to build a set of packages, the go command continued to build those packages with the same set of module versions, even if versions are retracted or new versions are published.

This proposal places build reproducibility over safety. Even the go1 compatibility guarantee allows for breaking APIs in the event of security issues. This is laudable, but I think there exist cases where you want to break builds, due to the severity of an issue, especially within a private organisation. This would lead to another keywords (e.g. salt, burn, blacklist) and extra complexity, but I think it is worth calling out that this proposal does not provide such a mechanism.

Can you speak to why you went with a comment for explaining the cause of the retraction? Specifically as opposed to some retraction-specific syntax.

@alex I went through a few different iterations of this, but I couldn't come up with a structure that works for all cases. For the purpose of the go command, a flat string satisfies all the requirements, and the simplest solution is usually the best.

That said, I think it makes sense to have some conventional format for security issues, the way we have ^// Code generated .* DO NOT EDIT\.$ for generated source files. I'll wait until we have a firmer plan on how to report and track security issues before proposing anything though.

One thing to point out: I don't think retracting module versions completely solves the problem of reporting security vulnerabilities. You can't retract individual packages or functions (perhaps a job for a separate static analysis tool later on). And only authors are allowed to retract versions, so there's no way for users of an abandoned or uncooperative module to be notified of a problem.

I know @FiloSottile and @katiehockman have been thinking about this. Any early thoughts or open issues?

Loving the self-retract mechanism of publishing a retraction of an accidentally pushed new major version.

@lizthegrey I meant to apologize that this wasn't in place earlier. Having to rename your module sucks.

For new major versions beyond v2.0.0, since Go 1.14, I believe you can now add a go.mod file to the latest version (say v2.0.1), and that will no longer be resolved as the +incompatible latest version for the base module. Buuut that doesn't help for v1.0.0 though.

Though the retraction mechanism doesn't discuss it (as far as I have read) it seems like some more thought needs to be put into the specifics of how a package vulnerable via a critically vulnerable dependency has its dependencies upgraded. If no changes are made to build, how will the developer building the package know there is a critical dependency vulnerability? It is not so common for Go developers in my experience to unreservedly issue a command to upgrade all dependencies.

@Zemnmez I think checking for retracted versions during a build would be too costly. We have to load the version list for each module that provides packages imported by the thing being built. If there are uncached versions, we have to fetch the go.mod file for those. That's why I'm limiting the check to go list -m -u and go get: those commands almost always have to do that work anyway.

I agree that it would still be easy to miss a notification about this though. I just opened #37781, suggesting that gorelease report an error if the module being tested requires a retracted module version.

I think there's an opportunity here for another tool as well: given a list of module versions (perhaps produced by running go version -m on a binary running in production), report whether any version is retracted.

@michael-schaller

I very much like the overall approach, the name retract and the location in go.mod. Well done! :-D

Thanks! I went through a lot of words; it feels a little bit like jargon, but it precisely describes what we're trying to accomplish.

I miss a section on how to ensure extensibility. How could this be extended without breaking old Go versions?

I got into this a bit in the section Compatibility with old versions. In short, if you use retract in your own module (the main module), you must use Go 1.15 or later. retract may be used in modules you depend on with any Go version that supports modules. Old Go versions will ignore retract outside the main module.

Can we add 'dead/abandoned/end-of-life' as possible reason as well?

I'd generally advise against using retract for that reason: if you want to encourage users to upgrade to a new version, it's better to do it with a carrot (new features, better performance) instead of a stick. Upgrades (especially to incompatible versions) are costly, and not always worth it from the user's perspective.

That said, if you have a module that provides an API for a service that's being turned down, the module will no longer work, and users should be told. retract would make sense in that situation.

Can we have something slightly better than comments to denote the retraction reason? It would be particularly good for vulnerability scanners to be able to determine how urgent a vulnerability is. Furthermore a grave bug, like a rarely occurring data corruption issue, could be urgent and should be reflected. So I'd like to see at least some machine readable reason enum and urgency/severity enum as part of the retraction definition.

Maybe. It's not clear at all to me what the format should be. We'd need something extensible that works for all use cases. Before committing to anything, I think we should have a firmer plan for vulernability tracking in general.

I would like to see that go vet checks for retracted versions. Many people run go vet before commit and so this would IMHO be a good place to check for retracted versions.

This feels a bit out of go vet's wheelhouse to me. You could probably build an analyzer (using golang.org/x/tools/go/analysis, the framework go vet is built on) that does this. But analyzers typically just do static analysis on packages, and they don't usually go out to the network.

I've just opened #37781 to do something like this in gorelease though. That seems like a better fit to me.

This proposal places build reproducibility over safety. Even the go1 compatibility guarantee allows for breaking APIs in the event of security issues. This is laudable, but I think there exist cases where you want to break builds, due to the severity of an issue, especially within a private organisation. This would lead to another keywords (e.g. salt, burn, blacklist) and extra complexity, but I think it is worth calling out that this proposal does not provide such a mechanism.

@carnott-snap I could see this being the case within an organization where you have control of the code and all its uses. There a number of ways to accomplish this outside of go.mod already: removing a version from the corporate proxy; running a test in CI that fails if a bad version is used; checking versions used by binaries running in production and alerting operators.

I'd be extremely cautious about adding a mechanism like this for the whole ecosystem though. We'd open ourselves up to a "leftpad" scenario, where an author (perhaps someone who's taken over an abandoned project) could break builds for everyone by retracting all versions. That attack still exists here, but it's annoying rather than fatal.

golang / go