golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

proposal: cmd/go: subcommands to add and remove modules from the module cache #28835

Open bcmills opened 5 years ago

bcmills commented 5 years ago

For a number of use-cases, it would be helpful to be able to upload modules to the module cache from source code (not just zip files!) in a local directory or repository.

Some examples:

To support those use-cases, I propose the following subcommands:

CC @hyangah @jadekler @rsc @myitcv @thepudds @rasky @rogpeppe @FiloSottile

bcmills commented 5 years ago

A way to explicitly populate the module cache from source might also help in cases where the original source path is blocked or unavailable but the code is available from a trusted mirror (as in #28652).

rsc commented 5 years ago

We have go mod download to add to the module cache. We have go mod clean -modcache to clear it. Do we really need more fine-grained control? I fear that will make people manage it more.

rsc commented 5 years ago

Never mind I didn't understand the problem being solved.

gopherbot commented 5 years ago

Change https://golang.org/cl/153819 mentions this issue: cmd/go/internal/modfetch: skip symlinks in (*coderepo).Zip

rsc commented 5 years ago

Ping @bcmills to summarize our discussion from 2 weeks ago about alternatives to meet the need you are trying to address here.

gopherbot commented 5 years ago

Change https://golang.org/cl/153822 mentions this issue: cmd/go/internal/modfetch: skip symlinks in (*coderepo).Zip

nim-nim commented 5 years ago

On the Linux distribution side, you need almost the same thing. The whole process is:

So you’d need almost the same, with a little tweak: deployment and indexing need to be separated

There is no concept of cleaning up the module cache, since all files are supposed to be associated with a single system component, so the system manager knows how to clean up them without help. I suspect this part won't go well with the proxy protocol as defined today since some files are shared between different versions of the same module (but .so file symlinks are pretty much the same mess so that should be manageable with a few hacks)

Lots of Linux subsystems, from python to fontconfig, behave this way today, that's a proofed deployment design pattern that is easy to integrate system-side

bcmills commented 5 years ago

@nim-nim, there is no “indexing” step in the module cache. Either the requested version is there, or it isn't.

nim-nim commented 5 years ago

@bcmills Then how is $GOPROXY/<module>/@v/list supposed to be generated?

You can go mod pack mymodule version x.y.z in system component golang-mymodule-x.y.z, that will contain

$GOPROXY/mymodule/@v/x.y.z.mod
$GOPROXY/mymodule/@v/x.y.z.info
$GOPROXY/mymodule/@v/x.y.z.zip

and then you can go mod pack version a.b.c in another system component golang-mymodule-a.b.c, that will contain

$GOPROXY/mymodule/@v/a.b.c.mod
$GOPROXY/mymodule/@v/a.b.c.info
$GOPROXY/mymodule/@v/a.b.c.zip

So far so good every file is nicely accounted for and the system component on-disk representation does not clash (even though having to manage a separate info file just because the module file does not contain the version is annoying).

But depending on whether the user installs only golang-mymodule-x.y.z, only golang-mymodule-a.b.c, or both $GOPROXY/mymodule/@v/list is not supposed to have the same content isn't it? So you need to reindex $GOPROXY/mymodule/@v/list on installation/uninstallation of anything in $GOPROXY/mymodule/@v/

In rpm tech that would mean adding a %transfiletriggerin and a %transfiletriggerpostun on the $GOPROXY directory that calls a go subsytem command to reindex all the stuff inside $GOPROXY every time the system component manager adds or removes things in it (rpm documentation)

rsc commented 5 years ago

The module cache is a cache. I really do not want the module download cache to have manual maintenance. That was the big problem with $GOPATH/pkg and go install: go install was manual maintenance of $GOPATH/pkg. The new build cache has no maintenance, which simplifies everything and eliminates a lot of awful failure modes. We'd really like the same for the module cache.

The operation being created above is really "pretend this module version has been published, so I can build and test other modules that depend on it". It's not clear to me that that should be scoped to a whole machine (a whole $GOPATH). At the very least it seems like we need two commands:

  1. Fake-publish this module.
  2. Build this other module using the fake-published stuff.

A build should never default to using the fake-published stuff. Then you can't do two logically separate things in a single GOPATH and we're back to manual cache maintenance a la go install. That is, if I'm in the middle of testing one fake-published module 1 against another module 2 and I get an interrupt and context switch to something completely different module 3 that happens to also depend on module 1, I don't want to have no way to get back to the real world where there isn't a fake module 1 floating around. That should be the default world I'm in. Otherwise the mental load of managing this automatically-used staging area is much like $GOPATH/pkg and go install.

I can't remember exactly what @bcmills and I discussed in late Nov 2018 but I think it was some other mechanism that wasn't "the module cache" for fake-publishing. You could imagine saying "fake publish to configuration foo" and then "build with configuration foo" and even "list configuration foo". Or maybe there's just one fake-published-world per $GOPATH.

nim-nim commented 5 years ago

@rsc It's not fake-publish, it's using your own code, only without forcing people to use github or artifactory in the middle. In the actual "real world" you have lots of situations where roundtripping to the github just to use your own code is not acceptable. So please make this use case work cleanly without artificial fake publish degradation, or people will just reverse engineer how go mod works and write their own tools you won't be happy with (already starting to, because modules are pushed before the tooling is finished and ready).

When you don't own your cloud like Google, when you don't have fat network pipes, when you have restricted networks because of $expensive and $dangerous factories plugged here, you don't roundtrip to the Internet all the time just because it's cool at home to look youtube videos.

As written in the module FAQ

Rather, the go tooling in 1.11 has added optional proxy support via GOPROXY to enable more enterprise use cases (such as greater control)

Greater control means greater control, and people doing the stuff they want with their code without opaque cloud intermediaries.

Besides making access to some remote VCS mandatory just to make use of some code, would make Go instantaneously incompatible with every single free software license out there.

rsc commented 5 years ago

@nim-nim I don't understand your response. I completely sympathize with the use case here and I spelled out a path forward that avoids the network. My use of "fake-publish" was not derogatory. I am referring to the operation of making it look locally like the module has been published even though it has not, hence "fake publish".

akamensky commented 5 years ago

I am not sure about other 2 commands in this proposal, but I think go mod pack is something that is going to be really needed by many developers. I know these comments are really frowned upon here, but in many already long established tools/ecosystems this functionality is deemed as must have. First comes to mind is Maven where you can publish artifact to local cache from local code.

Consider a project A that depends on library B. Often times developers want to develop and publish v1.2 of both A and B at the same time. How can I import module B v1.2 that I am working on locally to my project A that I am also working on locally? As of now (1.13b1) there does not seem to be any mechanism to achieve this without manually hacking into go.mod with replace and subsequently removing it from go.mod (again manually I presume) before publishing both.

perillo commented 4 years ago

The concept of pre-fill the module cache with a local (non published) module, or with a new revision not yet published, can be implemented with an external command.

Here is an implementation: https://github.com/perillo/gomod-pack. it calls go mod download -json with a custom environment, where git is configured with URL rewriting and go is configured with direct access and disabled checksum database.

gomod-pack can only be called inside a module, and the user can only specify the version to pack. It prints to stdout the versioned module path, that the user can use in a go.mod require directive.

The only drawback is that it only works with git.

marystern commented 4 years ago

That was the big problem with $GOPATH/pkg and go install: go install was manual maintenance of $GOPATH/pkg. The new build cache has no maintenance, which simplifies everything and eliminates a lot of awful failure modes. We'd really like the same for the module cache.

Hi, I'm coming from Issue #37554 and have just read this. I had no idea that "go install" was going to become deprecated!...Maybe this needs clarification in the community?

In my issue, I suggested that "go install" do the same as "go mod pack" in this proposal (and I prefer that way of expressing the command as it's the same as previous go-versions). I agree with @nim-nim as we both seem to want a fairly simple use case (local code using modules, not hitting the network), but the current implementation of modules makes this tricky to say the least.

ronakg commented 4 years ago

I just finished reading this whole thread because I hit into this same issue while developing a new app for an enterprise product. I'm still very new to Go, but no provision to import a separate module that I'm developing in parallel seems like a huge oversight.

Let me try to summarize my use-case:

As of now, there's no way for myapp to import mylib without adding the replace directive in the go.mod for myapp, which feels very hacky. I have to publish mylib separately without myapp and then update myapp go.mod file to remove the replace directive.

Another use-case is - when I'm developing a library that's used by multiple modules, I need to run integration tests for the dependent modules to make sure I'm not introducing any regressions. So now I need to change all the dependent modules' go.mod file and add a replace directive to point to the local module.

@marystern 's idea about go install installing the unpublished module locally in the cache sounds like a really good idea. That's how many build management systems work as well. Maven lets you build and install a jar/war file to the local maven repo for other Maven projects to import.

Helcaraxan commented 4 years ago

@ronakg, this is not really on the topic of this exact issue but the fact that you are using multiple modules in the same repository for your use-case seems to be an anti-pattern. In general multi-module repositories are not a recommended workflow.

In your specific case (based on the information you have provided) there should not be any reason for having multiple modules. Simply put your library and your app in the same module which should be rooted at the root of your repo. And if a new app using your library needs to be created it can live in the same module & repo as well.

Modules are a dependency-management & versioning abstraction, not a feature-level abstraction. Hence if everything (the library and the binaries) are part of the same product and will be shipped and versioned in a common fashion then they can all be part of the same module without any negative side-effects. Using multiple modules would actually make achieving your goals much harder and your day-to-day development workflows much more complex.

rsc commented 4 years ago

Based on discussion with @bcmills, @jayconrod, @matloob, putting this on hold because we need to think about the higher-level issue of publishing modules at all first. This issue was primarily intended to address publishing a collection of modules that depend on each other, perhaps in a cycle or perhaps not. That's the problem to solve; reusing the module cache is probably not the right solution.

Placing on hold to come back with a different solution.

folays commented 1 year ago

May I add that [#44989] and [#32976] have been marked as duplicate on this current one, but for CGo the impact is heavy to not yet have a command to "clean only one module" from the cache.

Indeed when modifying a C/C++ source file in CGo outside of compiled package, there isn't an easy way to force rebuild of the compiled package beside cleaning ALL module cache, which has an heavy recompile time cost, especially when it's not the only CGo module in the whole project...

If you would want to argue that keeping those C/C++ source files outside of the specific compiled package directory is not a good practice, please keep in mind that keeping those C sources files separate in their own directory allows them to be in a git-subtree dedicated folder, and permits to follow C++ upstreams and emit diff-to-upstream patches easily.