golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.59k stars 17.48k forks source link

cmd/go: add modvendor sub-command #27618

Open myitcv opened 5 years ago

myitcv commented 5 years ago

Creating this issue as a follow up to https://github.com/golang/go/issues/26366 (and others).

go mod vendor is documented as follows:

Vendor resets the main module's vendor directory to include all packages
needed to build and test all the main module's packages.
It does not include test code for vendored packages.

Much of the surprise in https://github.com/golang/go/issues/26366 comes about because people are expecting "other" files to also be included in vendor.

An alternative to the Go 1.5 vendor is to instead "vendor" the module download cache. A proof of concept of this approach is presented here:

https://github.com/myitcv/go-modules-by-example/blob/master/012_modvendor/README.md

Hence I propose go mod modvendor, which would be documented as follows:

Modvendor resets the main module's modvendor directory to include a 
copy of the module download cache required for the main module and its 
transitive dependencies.

Name and the documentation clearly not final.

Benefits (WIP)

Costs (WIP)

Related discussion

Somewhat related to discussion in https://github.com/golang/go/issues/27227 (cc @rasky) where it is suggested the existence of vendor should imply the -mod=vendor flag. The same argument could be applied here, namely the existence of modvendor implying the setting of GOPROXY=/path/to/modvendor. This presupposes, however, that the idea of modvendor makes sense in the first place.

Background discussion:

https://twitter.com/_myitcv/status/1038885458950934528

cc @StabbyCutyou @fatih

cc @bcmills

bcmills commented 5 years ago

I don't think the proposed resets the main module's modvendor directory behavior is quite the right workflow.

One of the benefits of versioned modules over vendoring is that they can reduce redundancy globally: instead of N copies of the same code spread across N repos, we can have a single canonical copy shared by all builds of those repos. A per-module modvendor cache would revert that advantage.

bcmills commented 5 years ago

Instead, perhaps we should make it easier to maintain per-user or per-organization module proxies.

For example, we could add an optional argument to go mod download to tell it where to save the downloaded modules.

go mod download $path could copy all active modules to $path, and go mod verify $path could verify that the modules already stored in $path match the go.sum of the current module. Then, the modvendor operation would essentially be:

go mod download $GOPROXY
go mod verify $GOPROXY

Then the user could commit the contents of $GOPROXY to a separate (personal or org-wide) repository.

flibustenet commented 5 years ago

We should also do the opposite, to fill the cache from a downloaded directory.

$ go mod download -export $path 

somewhere else, maybe an other machine
$ go mod download -import $path
that will fill the cache
bcmills commented 5 years ago

@flibustenet GOPROXY already does the opposite: GOPROXY=$path go mod download populates the active modules into the user's module cache from an arbitrary directory.

We don't currently have a command that populates more than the active modules, but that seems like a job for rsync or git rather than go itself.

myitcv commented 5 years ago

@bcmills

I don't think the proposed resets the main module's modvendor directory behavior is quite the right workflow.

I think there are actually two use cases here:

  1. "vendoring" all dependencies within the same repo as the module(s) that depend on them
  2. a per-user/organisation module proxy repo, separate from the repo(s) that use it

I should update the description to make clear that this issue is trying to address point 1. Hence why I think the logic to "reset the main module's modvendor directory" is correct; because I don't want this directory to grow like a cache.

Point 2 is the approach I've taken with https://github.com/myitcv/cachex, which is the "organisation repo" for https://github.com/myitcv/x, my mono repo. In this case, https://github.com/myitcv/cachex is an append-only repo that is a cache, and hence grows over time. It's separate from (and a subset of) $GOPATH/pkg/mod/cache/download because that can (and does) include downloads of private repos that I don't want made public. As you say, this approach reduces redundancy. Your proposal of go mod download $path is effectively what I do via bash with a GOPROXY+GOPATH+rsync dance; in this situation, I agree, I don't want the reset semantics.

But I can see use cases (i.e. deploying code or similar) where there is real benefit in point 1, for everything to be "bundled (in the same repo).

Assuming we want to address/support both use cases (and it seems sensible to my mind to do so), they could be solved by the same sub-command; I'm certainly not precious about that 😄. But I think there are separate use cases to cover here.

bcmills commented 5 years ago

I can see use cases (i.e. deploying code or similar) where there is real benefit in point 1, for everything to be "bundled" (in the same repo).

I'm not certain about those cases one way or the other. Given versioning, it seems like you can address all of the same use-cases — and more! — using a separate repository. If folks are doing the cost/benefit analysis and coming to a different conclusion, I'd like to see more of the details of the costs and benefits involved (beyond just “that's the way we've done things without versioning”).

myitcv commented 5 years ago

If folks are doing the cost/benefit analysis and coming to a different conclusion, I'd like to see more of the details of the costs and benefits involved (beyond just “that's the way we've done things without versioning”).

I'd second this request because, unless it wasn't clear already, I'm a fan of point 2.

I'm only putting up point 1 as a "better" alternative to go mod vendor (better in the sense that it doesn't suffer from the pitfalls associated with https://github.com/golang/go/issues/26366 amongst other things). But, and I totally grant you this, I haven't articulated all (any?) of the costs associated with keeping workflows oriented around a single repo, a la vendor.

bcmills commented 5 years ago

Hmm. With the go mod download $path approach, it's still possible to put $path in the same repository (cutting it off from the modules in that repo using an explicit go.mod file, or perhaps with a well-known subdirectory such as vendor/mod/ or vendormod/), and you can even unpack it easily with a single command (GOPROXY=$path go mod vendor).

myitcv commented 5 years ago

Yes absolutely; I think the only difference between these two use cases is the use of "reset" semantics or not.

sanguohot commented 5 years ago

modules shared is very important, but there still would be some no share cases. A litte like NPM without -g flag.

rsc commented 5 years ago

Replying to the original benefits:

  • Eliminates any potential confusion around what is in/not in vendor

Having two ways to populate vendor does not seem like it would eliminate confusion.

We should address gohack, but modvendor does not seem like the right way to do it.

  • The modules included in modvendor are an exact copy of the original modules. This makes it easier to check their fidelity at any point in time, with either the source or some other reference (e.g. Athens)

It would be better to make go verify work with the pruned vendor directories, if that's a concern.

  • Makes clear the source of modules, via the use of GOPROXY=/path/to/modvendor. No potential for confusion like "will the modvendor of my dependencies be used?"

This is doubling down on vendor. We want to move in the opposite direction.

  • A single deliverable

I don't know what this means.

  • Fully reproducible and high fidelity builds (modules in general gives us this, so just re-emphasising the point)

No actual benefit here, right?

I don't see what the problem is here, really, and I think it's very important not to pull in the entire module just to get one package. Because you're not just pulling in that one module, you're pulling in (at least references to) its dependencies.

myitcv commented 5 years ago

Thanks for the reply @rsc. Taking your responses slightly out of order:

Easier to contribute patches/fixes to upstream module authors (via something like [gohack (https://github.com/rogpeppe/gohack)), because the entire module is available

We should address gohack, but modvendor does not seem like the right way to do it.

Agreed, this doesn't make sense to solve with modvendor; not sure what I was thinking here. gohack get has a -vcs flag for just this purpose.

This is doubling down on vendor. We want to move in the opposite direction.

Just to be clear, I'm also trying to move away from vendor (the vendor directory as in the Go 1.5 definition) and the concept of "vendoring" more generally (and modvendor falls into this bucket), because there are better solutions to the problems that vendor/"vendoring" try to solve.

My thinking was that something like modvendor could be a useful stepping stone away from the vendor directory to proxies etc.

Eliminates any potential confusion around what is in/not in vendor

Having two ways to populate vendor does not seem like it would eliminate confusion.

modvendor uses a modvendor directory, not the vendor directory. The thinking being that a differently named directory forces the user to ask "what can I expect to be in modvendor" as opposed to being confused on "what is in vendor."

A single deliverable

I don't know what this means.

Poorly worded. One of the main reasons people like the vendor directory is that it removes service/network dependencies beyond the initial clone, there is nothing else to configure, no second repository to commit etc. modvendor achieves a similar effect - there is just one thing in play.

Fully reproducible and high fidelity builds (modules in general gives us this, so just re-emphasising the point)

No actual benefit here, right?

Agreed, if we can get go verify to work on the contents of the vendor directory. The only minor point I was making here was that it's very easy to modify the contents of your vendor directory and not run go verify either locally or enforce it as part of CI. It's harder to modify the contents of modvendor in the first instance. Case in point being https://github.com/goware/modvendor et al which exists to copy additional files to the vendor directory, files that are already in the module.

I don't see what the problem is here, really, and I think it's very important not to pull in the entire module just to get one package. Because you're not just pulling in that one module, you're pulling in (at least references to) its dependencies.

At least the way I intended to implement my trial of modvendor was to only pull in the modules that are required, so hopefully I only pull in references to their dependencies.

But I'm quite prepared to accept that modvendor might not be the right or even a necessary stepping stone.

thanasik commented 5 years ago

Have this issue as well, the c and h files from dependencies that import cgo are not vendored, builds fail because of this. We're using modvendor right after running vendor to solve it, though this would be convenient and should be standard since builds fail without it if they need to vendor non-go files

dryaf commented 5 years ago

c and h files are missing for example in gopkg.in/goracle.v2 when running go mod vendor and then go build -mod vendor . fails

go mod vendor should just copy the cache. in case somebody has a problem with that for some reason, some flag --ignore-tests could help. but some might also like to run the tests of the dependencies in the ci.

nomad-software commented 5 years ago

@dryaf @karysto Try https://github.com/nomad-software/vend which will vend everything.

anjmao commented 4 years ago

@myitcv I suggest to keep it simpler and do not introduce new subcommands. Maybe go mod vendor -a will do the job.

Talking about #26366 issue. I tried to migrate one of our go project to use go modules, but since it uses some C dependencies and gomobile is not working with go modules I thought well I will just use go mod vendor and it will do the job the same as dep is working, but... well... it's not. My initial thinking was that go mod vendor was introduced for easer migration to go modules, but it looks that we still need to write custom tools like vend again and again.

gunsluo commented 4 years ago

@anjmao I agree with your suggestion and hope to solve the problem in a simple way. Personally speaking, go mod vendor -a is a good choice.
@myitcv In my go project and it use some C dependencies and template files in other packages in modules. I run go mod vendor and it ignored .c .h .tpl file, I had to manually copy the files into my project. if go mod vendor -a $package will copy all files of $pakcage to the vendor, It was the result I expected.

rezaalavi commented 3 years ago

We are facing the same problem. For legal reasons, we need to include the license file of some packages to the vendor directory.