golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.64k stars 17.41k forks source link

proposal: cmd/go: automatic and partial vendoring in module mode #30240

Closed bcmills closed 4 years ago

bcmills commented 5 years ago

This proposal overlaps with (and hopefully unifies) several existing issues, linked in the text below.

I'd like to implement it soon, in the 1.13 1.14 cycle, so if you have feedback please do respond quickly. 🙂

Problem summary

Users want a durable, local view of their source code that works with existing diff tools and does not require per-user configuration in cloned repositories.

Proposal

Under this proposal, the source code for the packages listed in vendor/modules.txt — and the go.mod files for the modules listed in vendor/modules.txt, if any — will be drawn from the vendor directory automatically (#27227).

If a replace directive in the main module specifies a module path, the module source code will be vendored under the path that provides the replacement, not the path being replaced. That preserves the 1:1 correspondence between import paths and filesystem directories, while allowing replacement targets to alias other modules (#26904). If a replace directive specifies a file path, then either that path must be outside the vendor directory or the vendor/modules.txt file must not exist (#29169).

Package patterns such as all and example.com/... will match only the packages that are present in the vendor directory, not unvendored packages from the same module. During the build, if additional packages from the vendored modules are needed in order to satisfy an import, the source for those packages will be fetched (from the module cache, if available) and added to the vendor directory. (Packages from outside the already-vendored modules will not be vendored automatically.)

Any time the go.mod file is written, if a module path found in vendor/modules.txt has a different version than that found in the build list, the already-vendored packages and go.mod file from the previous version will be deleted, and updated versions of those packages will be written in their place (#29058). Transitive imports of those packages will be resolved, and may populate additional packages in other already-vendored modules.

If go get removes a module from the build list entirely, its package source and go.mod file will be removed, but an entry for the module (with version none) will remain in vendor/modules.txt. That way, if a future operation (such as a go get or go build) adds the module to the build list again, it will remain vendored as before.

When go mod tidy is run, it will add or remove packages from the vendor directory so that it continues to contain only the subset of packages found in the transitive import graph. It will also remove go.mod files and entries in vendor/modules.txt for modules that are no longer present in the build list.

To encourage the minimal use of vendor directories, the go mod vendor subcommand will accept an optional list of packages or modules. go mod vendor <module> will update the vendor directory to contain the go.mod file for <module> and source code for its packages that appear in the transitive import graph of the main module. (Note that, since the criterion for inclusion of a package is its existence in the import graph, vendoring in an additional module should not affect the contents of any previously-vendored modules.)

go mod vendor <pattern> for an arbitrary module pattern will add # <pattern> to vendor/modules.txt, and vendor in the go.mod files (and any packages found in the import graph) for modules matching <pattern>, adding individual comments to vendor/modules.txt for those modules.

Note in particular that go mod vendor all will copy in go.mod files for all of the module dependencies in the module graph (and add entries in vendor/modules.txt for those modules). That ensures that after go mod vendor all, go list can produce accurate results without making any further network requests (see also #19234 and #29772).

The go mod vendor subcommand will accept a new flag, -d. go mod vendor -d <pattern> will remove all previously-vendored modules matching <pattern> from the vendor directory (and from vendor/modules.txt), as well as any previously-stored patterns matching those modules (including <pattern> itself, if present).

go mod vendor, without further arguments, is equivalent to go mod vendor all. go mod vendor -d is equivalent to go mod vendor -d all. If go mod vendor -d causes vendor/modules.txt to become empty, it will also remove the entire vendor directory.


Edits

gopherbot commented 5 years ago

Change https://golang.org/cl/165378 mentions this issue: all: add -mod=vendor to GOFLAGS in tests that execute 'go' commands within std or cmd

selslack commented 5 years ago

@bcmills My experience report: I vendor all the dependencies, I do not modify the vendor folder (I do not edit or patch sources there). My build is offline and fails if I have a missing dependency. No magic, I'm happy.

My expectations from new tools provided by Go itself:

  1. Keep the old behavior of the vendor folder (which we had for years, literally);
  2. Work outside of GOPATH (which simplifies development).

Please provide a simple and reliable solution which satisfies the community's expectations.

I've read both big threads and I see that there are basically two use cases:

  1. Vendor all the dependencies.
  2. Vendor none.

And one potential use case:

  1. Vendor some.

I haven't seen experience reports with this one (maybe I missed one, these threads are super big). But this potential case smells like nobody likes complexity like setting up a proxy with access to private repositories.

Thanks!

bcmills commented 5 years ago

@selslack, thanks for the report, but it would help even more if you could share the why of your current workflow instead of the what. (In particular: why, if at all, would you prefer to vendor packages rather than modules? Why, if at all, would you prefer to vendor source files rather than zipfiles?)

We know how things work today, but we also know that for at least some use-cases there will be better alternatives in module mode. I want to optimize the vendor functionality for the subset of use-cases that don't have a better alternative.

JeremyLoy commented 5 years ago

@bcmills sorry for responding a month late, and thank you for writing this proposal!

So lets walk through a scenario, and please correct me if I get anything wrong:

  1. I have a preexisting go project.
  2. I have a need to add a dependency to a project, and I wish to vendor it
  3. I run go mod vendor _pkgname_, which downloads the source code into the vendor directory
  4. I update my project to use the new dependency
  5. go build _mainpkgname_ works

This is a slightly different scenario than what I was originally envisioning, which was making go get vendor aware.

It's not bad, still just one command: go mod vendor _pkgname_.

Because partial vendor support is something mentioned above, I think this is an adequate compromise. It does feel a bit odd though. It definitely doesn't feel as nice as go get effortlessly switching between go path and module mode.

Its also not quite what the title describes. "automatic vendoring in module mode". Its not really automatic if its using a different command.

nomad-software commented 5 years ago

Is there a specification documented for vendor/modules.txt?

I want to update my own vendor tool because we seem to be going round in circles here and making vendor'ing way more complicated than it needs to be.

thepudds commented 5 years ago

Hi @JeremyLoy

This is a slightly different scenario than what I was originally envisioning, which was making go get vendor aware .

It's not bad, still just one command: go mod vendor pkgname.

As far as I understand the current proposal, the common case is that you would do a one-time operation of go mod vendor(which is a synonym for go mod vendor all).

Once you've done that, you've signaled your desire to use vendoring, and at that point the vendor directory will automatically track any subsequent go get foo@v1.2.3 and go get bar and also even if you add a new import path to your code for a previously unused module and do something like go build. In other words, once you do the one-time operation of go mod vendor for one of your projects, you would not need to separately do go mod vendor foo and go mod vendor bar. Was that part of your concern? Sorry if I have misunderstood the concern.

JeremyLoy commented 5 years ago

@thepudds that may in fact be the case. My initial impression from reading the proposal was as you described, but all of the follow up discussion in this thread regarding support for partial vendoring is where I am confused.

I personally don't see a need to support partial vendoring, at least for initial release. It just complicates the issue. Simply making go get vendor aware for this first pass doesn't exclude partial vendoring from a future release.

theckman commented 5 years ago

@bcmills what happens if I have a Module I depend on named all defined in my go.mod, and I only want to vendor that? How do I only vendor all and not github.com/theckman/example too?

thepudds commented 5 years ago

@JeremyLoy

@thepudds that may in fact be the case. My initial impression from reading the proposal was as you described, but all of the follow up discussion in this thread regarding support for partial vendoring is where I am confused.

The history of the conversation here is slightly confusing. The initial proposal was a bit different, so the first 18 or so comments above were reacting to that initial proposal. The proposal was then updated at the time of https://github.com/golang/go/issues/30240#issuecomment-464071411 in a way that largely addressed many of the initial concerns in those first 18 or so comments. I think the change in the proposal at that time also addressed what I think was the primary concern you were expressing in https://github.com/golang/go/issues/30240#issuecomment-474541877. I understand that you also expressed concern that you might not need partial vendoring, but I think what might have been your primary concern around the common case of go mod vendor automatically tracking future go get foo and similar commands is part of the current proposal (without the need to also do go mod vendor foo).

In other words, if you read the proposal as it stands now in the first comment https://github.com/golang/go/issues/30240#issue-410509265 and mostly like what it currently describes, then that is a good sign. As far as I am aware, the proposal as it stands now in the first comment is a complete description of the current proposal.

bcmills commented 5 years ago

@nomad-software, there is no current formal specification for modules.txt, and I don't intend to provide one: as demonstrated in this proposal, the format may be subject to change (although it should remain broadly compatible in the face of any changes).

The programmatic entry points for tools to interact with vendor in module mode are go list and go mod vendor. Anything beyond that should follow the proposal process.

bcmills commented 5 years ago

@theckman

what happens if I have a Module I depend on named all defined in my go.mod, and I only want to vendor that?

Module paths without a dot in the first component are in general reserved for the standard library: it seems exceedingly unlikely that there will ever be a module with a literal module path of all that you can go get or go mod vendor.

bcmills commented 5 years ago

That ensures that after go mod vendor all, go list can produce accurate results without making any further network requests (see also #19234 and #29772).

It occurs to me that this might not actually remove the need for network access — at least not without significant changes to module-mode loading. It is possible that some dependency found in the module graph (go list -m) is only reached through an earlier-than-selected version of some other dependency, and since the vendor directory would contain only the most recent version of the go.mod file, that part of the module graph could be missed if we only consult the vendor directory.

I suspect that go list without -m would be fine, but go list -m in particular might still need network access.

thepudds commented 5 years ago

@bcmills would it be reasonable to populate the older go.mod files as well in the vendor directory during a go mod vendor under this proposal?

This might not be the proper analogy, but in other words, would it be reasonable to place in vendor the go.mod files you would end up with if you manually did something like GOPATH=$(mktemp -d) go mod download in 1.12 today (or similar if that is not correct in 1.12)?

bcmills commented 5 years ago

@thepudds We would need to put the go.mod files at some path that contains the complete (presumably canonical) version as a path component: otherwise, they might overlap with the go.mod files vendored for other versions of the same module. The module cache, not the vendor directory, is where we put per-version files today, and adding a similar facility to the vendor directory would be a significant overlap.

I suppose that goes to @ianthehat's broader point about the overlap between the module cache and the vendor directory.

thepudds commented 5 years ago

We would need to put the go.mod files at some path that contains the complete (presumably canonical) version as a path component

Yes, agreed.

I suppose that goes to @ianthehat's broader point about the overlap between the module cache and the vendor directory.

The glass-half-full way to look at that might be that the logic to track go.mod files in a version-aware manner would not need to be invented from scratch for vendor ;-)

nim-nim commented 5 years ago

Or you could add a directory with the zip and mod files and use it as a file proxy (which is something it might be worth looking into as a better version of vendoring)

That's issue #31302

We've considered that, but it really doesn't work well with version control systems: the diffs are incomprehensible and the blobs can end up consuming a lot more space than they ought to (depending on the encoding).

That's only the case if you want to keep everything in a single VCS repository. In a large organization, where curating external code is done by many people/teams, you want to split the vendoring in separate repositories, that produce read-only modules, that are then consumed by all the organization projects.

The only reason all this stuff is in single huge vendor directories right now is that there was no robust way to share curating results in Go. Now we have one, that's modules + goproxy (#31304)

nim-nim commented 5 years ago

what happens if I have a Module I depend on named all defined in my go.mod, and I only want to vendor that?

Module paths without a dot in the first component are in general reserved for the standard library: it seems exceedingly unlikely that there will ever be a module with a literal module path of all that you can go get or go mod vendor.

IIRC modfile.Parse will error out if there is not at least one dot in the module name (for the branch exposed in github.com/rogpeppe/go-internal/)

virtuald commented 5 years ago

@bcmills an additional experience report for me, but it's easier to quote @selslack

I vendor all the dependencies, I do not modify the vendor folder (I do not edit or patch sources there). My build is offline and fails if I have a missing dependency. No magic, I'm happy.

This is exactly it for me, and why so far I've avoided using go modules and kept my dep+vendor workflow.

Why? The target environment for a project has no internet access, and it is a requirement for us to be able to build it in that environment. Vendoring has made accomplishing this incredibly simple and I never have to worry about whether my project will build correctly.

Sure, there are other ways of solving this problem as mentioned above (caches, etc), but vendoring is so incredibly easy to use/understand that I don't see why I would bother.

A side effect of committing all my dependencies to a vendor git repo is that it makes it really easy to audit any incoming changes to dependencies and watch for unexpected changes. I admit a filesystem cache as mentioned above could accomplish this same goal.

bcmills commented 5 years ago

I'm running out of time in the 1.13 cycle. I still want to make this happen, but unfortunately it's going to slip to 1.14.

roblillack commented 5 years ago

@bcmills: What do you think about an alternative solution, where you'd flip a switch in go.mod to turn on "auto vendor" mode. When a module is in "auto vendor" mode, the following things would happen:

I feel like this would be the most sane solution for me, and pretty much comparable to dep or glide workflow which worked a treat for us for a long time.

Edit: To be more specific about my comment above: I'd prefer it, if having a /vendor directory would be sufficient enough to signal the Go tools that I want 100% of my dependencies vendored all the time and that all tools should run in -mod=vendor mode. But I understand, that this approach is not really something the Go team considers, so maybe having a setting in go.mod is.

selslack commented 5 years ago

@bcmills I want to add a recent story of how the proper vendoring saved us a lots of time: https://success.docker.com/article/docker-hub-user-notification.

As a part of security review after receiving this notification -- we performed an audit of all the dependencies in Java, NPM, etc.

Auditing our Go code took exactly 0 seconds, because we have all the dependencies committed and we don't go online during build process at all.

MOZGIII commented 5 years ago

@bcmills: What do you think about an alternative solution, where you'd flip a switch in go.mod to turn on "auto vendor" mode. When a module is in "auto vendor" mode, the following things would happen:

  • All changes to the dependencies (go get ...) would automatically be vendored to /vendor/...
  • All respective commands (go build/go run/go test) would always run with -mod=vendor

I feel like this would be the most sane solution for me, and pretty much comparable to dep or glide workflow which worked a treat for us for a long time.

Edit: To be more specific about my comment above: I'd prefer it, if having a /vendor directory would be sufficient enough to signal the Go tools that I want 100% of my dependencies vendored all the time and that all tools should run in -mod=vendor mode. But I understand, that this approach is not really something the Go team considers, so maybe having a setting in go.mod is.

I agree.

I think we should consider actually removing -mod=vendor flag, and moving it to a per-project configuration of some sort. With that flag as it currently is, we're not going to solve the problem that is outlined as a part of this issue's original posting: configuration per user via GOFLAGS. It would still be required for some projects as long as -mod=vendor has the meaning that is actually depending on what are you trying to do in a project (current task) rather than on what project you're doing it (global task). To be more specific, I'd still want to have -mod=vendor enabled all the time to force go to never try to load anything from the network without explicit go mod vendor invocation in some of the projects I'm working on.

myitcv commented 5 years ago

Following a good discussion at GopherCon with @ChrisHines, we concluded that a key reason for needing vendor today (in his situation at least) is touched on by the second bullet point in @bcmills' description:

Saved module caches do not interoperate well with version-control and code-review tools.

Put another way: vendor is used because there isn't a better alternative to reviewing dependency changes alongside (and as part of the same process as) changes to one's own code.

We further concluded that all other aspects of his requirements (including reproducible builds, self-contained CI runs etc) could be satisfied by alternative means, not least, for example, an approach similar to https://github.com/golang/go/issues/27618. None of these alternatives are currently as polished/easy as the vendor workflow, but they would do the job (and could become more polished).

Back to the point on reviewing dependency changes. This point is obviously critical. Not only for those people who prefer the vendor flow because they can easily solve this problem as part of their existing work flow, but for everyone who uses Go modules. We can't, today, point to a tool that helps us achieve this.

That said, to avoid the problems of only having parts of modules "vendored", I think this points towards a solution where entire modules are "vendored" with a directory structure similar (identical?) to that found under $GOPATH/pkg/mod. Whether it's all modules or some I defer to others. This keeps modules intact (important in keeping the solution simple for tools etc) and retains the current benefits of being able to review dependency changes alongside one's own changes. Whether this is achieved by implicit replace directives or any other means, I again defer.

Apologies if I'm late to the party on all of this: I just wanted to stress/highlight that this point on code review has a life well beyond this issue that is of much wider interest.

myitcv commented 4 years ago

On that back of my last comment I've just raised https://github.com/golang/go/issues/33466

myitcv commented 4 years ago

I think this points towards a solution where entire modules are "vendored" with a directory structure similar (identical?) to that found under $GOPATH/pkg/mod

Just to slightly row back on this point: whether we need the entire module to be "vendored" is actually something I defer to Bryan (and Ian) on. If cmd/go can make things work in a partial way then I don't actually see a reason to "vendor" the entire module (indeed, given the code review point there is good reason not to). Because go/packages et al will just "work" because cmd/go "works"

I previously wrote "entire module" because I read (perhaps incorrectly) that this had become necessary. But Bryan/Ian are the authorities on that point, so restating that point for clarity.

bcmills commented 4 years ago

I am withdrawing this proposal in favor of #33848. My reasoning is as follows:

Should vendored dependencies be updated automatically?

Here I have proposed that go commands should update and/or add to the contents of the vendor directory automatically.

However, in the time since then, we have observed that users are often confused by the implicitness of updates to the go.mod file. Given that, we probably should not overwrite existing contents in the vendor directory without explicit user intervention — that style of automatic vendoring would add yet another layer of substantial changes driven by the same implicit mechanism, and while diffing and reverting changes in the go.mod file is relatively easy, diffing and reverting unexpected changes in the vendor directory is not.

I now believe that we should not make such updates automatically.

Should we allow vendoring of only a subset of packages?

Here I have proposed that go mod vendor should accept patterns to allow users to vendor only a subset of modules.

I still think that's a good idea in concept, particularly for repositories that contain multiple interdependent modules and replace directives, but it adds enough complexity that it should be considered separately from — and presumably after — changes to automatically use and/or maintain the vendor directory.

artemgavrilov commented 4 years ago

It would great to vendor a single module. We have a library that has a directory with .yaml files(openAPI types common for multiple services). These files are used by other projects, they include these files in their own API specifications. Now we vendor all dependencies, and can invoke a command from makefile that generates go code from service api spec and types from lib (can reference them ./vendor/someRepo/file.yaml )