golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.21k stars 17.7k forks source link

proposal: cmd/go: module aliases #70443

Open pohly opened 2 days ago

pohly commented 2 days ago

Proposal Details

Importing a Go module is tied to a DNS domain. Should the developer of a module loose ownership of that domain, then the module must be migrated to a different domain. All consumers must follow at the same time, otherwise a binary might end up with a mixture of old and new code under different names, which can cause failures (new module A is used and configured by the binary, old module B used by some dependency is not).

Kubernetes is currently facing that challenge because the .io domain might go away and all modules are called k8s.io/<something>. It is not certain that anything needs to be done, but if something needs to be done, then it might take years to be ready - so let's discuss now.

Can Go support such a transition more gracefully?

One possibility would be to let a module define one or more aliases in its go.mod:

module k8s.io/api
alias k8s.dev/api

If such a module gets imported via the alias, the compiler should not complain and treat it as if it had been imported under the official module name. Nothing would change in the vendor directory (in particular not some mass renaming of files). In https://pkg.go.dev, the alias could be a redirect to the original name. Deep links into the documentation of a package remain valid.

Obviously this only makes sense if the original domain is guaranteed to disappear or not to be used anymore. If the module author transfers a domain, they have to migrate and cannot use the old module name anymore because the new owner of the domain might decide to publish its own modules there. If Go should detect such a conflict is TBD.

ianlancetaylor commented 2 days ago

CC @matloob @samthanawalla

Another approach might be to extend the import path support we already have (see go help importpath). We could change the go command so that lookups in the top-level .io domain are automatically translated to, say, io.go.dev, which would respond with meta tags redirecting a known set of packages.

seankhliao commented 2 days ago

related: #26904 but with a replacement pushed up from the dependency.


Since the main module needs a new enough version of the old module anyway to be able to use this, I think the module should instead just commit to its new identity, and setup old modules to forward any declarations to the new module. Something like #32816 can later automate the cleanup.

pohly commented 2 days ago

Since the main module needs a new enough version of the old module anyway to be able to use this, I think the module should instead just commit to its new identity, and setup old modules to forward any declarations to the new module.

Are you suggesting that we should start publishing everything under new names and replace the content of the old modules with wrappers of the new modules, using type aliases to ensure compatibility? That would be a lot of work and fragile because functions and global variables cannot be aliased.

seankhliao commented 2 days ago

global vars that are modified would be unfortunately left out (all the more reason to avoid them...), but functions can be wrapped, and later inlined to the new functions the wrap, see https://github.com/golang/go/issues/32816#issuecomment-1854496428 and https://pkg.go.dev/golang.org/x/tools/internal/refactor/inline

pohly commented 2 days ago

global vars that are modified would be unfortunately left out (all the more reason to avoid them...)

Too late. We already have them. :cry:

but functions can be wrapped

This changes stack unwinding, which is relevant in a few cases.

But the main point is that this is highly unpractical. We would have to support these wrappers across multiple releases until we can be relatively sure that everyone has migrated to the new names. Doing it once manually would already be too much work, doing it repeatedly is out of the question completely unless we can come up with a code generator which does all of this automatically and get it integrated into the release processes of several different repos (yes, plural).

apparentlymart commented 2 days ago

I am sympathetic to this use-case, but I have a reservation: This seems to amount to a similar effect as a replace directive, but placed somewhere other than the main module. That seems potentially confusing, and contrary to the current idea that only the main module gets to perform replacements that affect the compiled program.


A variation I thought about is to treat this more like a deprecation, where no automatic aliasing occurs but the toolchain announces that the module address has been deprecated during installation, perhaps similar to the current message about a version having been retracted:

go: warning: k8s.io/api has moved to k8s.dev/api
go: consider temporarily adopting the new name as a replacement for the old name until all imports are updated:
        replace k8s.io/api v1.0.0 => k8s.dev/api v1.0.0

This gives a downstream maintainer a notification that this module is being renamed, but doesn't automatically change the meaning of any module addresses already in the program. Instead, it just guides the downstream maintainer toward a solution that they control in their own go.mod.

Adding a replace directive like suggested in the error message would then get the intended effect of allowing both names to be used together in the same program until all transitive dependencies have been updated to use k8s.dev/api, at which point the main module would no longer depend on k8s.io/api at all and the replace directive can be removed along with its associated require directive.

This could perhaps be combined with a new rule that prohibits using both the old and the new name in the same go.mod unless there's a replace directive, so that transitive dependencies that lead to both names would force the maintainer of the main module to add the replace directive and then the maintainers of both k8s.io/api and k8s.dev/api can assume that no program could ever have packages from both module addresses compiled in together.

With that additional rule it would remain valid to depend only on k8s.io/api without needing a replace (generating a warning, as described above) to retain backward compatibility with already-published dependents, but including both would require the downstream maintainer to reconcile the conflict with a replace directive until all their upstreams have converged on using the new module path.

This does admittedly put some burden on dependents of the module that is being renamed, but that means that they will end up with a clear record in their go.mod that something unusual is going on. This burden would only be felt once they upgrade an upstream past the version where the alias directive was added to its go.mod, either directly or indirectly, and would be relieved once all of their upstreams have agreed to use the new module path.

(in the above I've used "upstream" to mean "a module mentioned in a require directive in your go.mod, either direct or indirect", and "downstream" to represent the opposite relationship.)

pohly commented 2 days ago

This gives a downstream maintainer a notification that this module is being renamed,

"consider temporarily adopting" might not be strong enough: if a program mixes old and new module, there are bound to be regressions. Definitely the program size increases, but the most common and problematic case will probably be including k8s.io/klog/v2 and k8s.dev/klog/v2. One or the other will not work properly, depending on what gets used by the binary.

This could perhaps be combined with a new rule that prohibits using both the old and the new name in the same go.mod unless there's a replace directive,

That would prevent this problem. If we go down this path, then I think this will be required.

This does admittedly put some burden on dependents of the module that is being renamed,

Indeed. To put this into perspective, k8s.io/klog/v2 is imported 16627 times. I believe that's packages, but the list of affected projects and modules is still very long. And that's just one of the affected modules! k8s.io/apimachinery is even worse.

Looking at individual projects, some would need 16 (https://github.com/helm/helm/blob/main/go.mod) or 20 (https://github.com/istio/istio/blob/master/go.mod) replace statements.

If this is how it needs to be done, it would probably cause quite a bit of work for the entire ecosystem.

would be relieved once all of their upstreams have agreed to use the new module path

Obsolete replace statements do not get removed by go mod tidy. Cleaning up will be more work, or (more likely) will be forgotten.

jaloren commented 1 day ago

@ianlancetaylor but that doesn’t work once the old domain goes away does it? what if to gracefully migrate it would be very helpful for the code changes to be done after the domain disappears?

seankhliao commented 1 day ago

I have more concerns about this:

We currently delegate module identity to DNS. By allowing an identity to be disconnected from that, we potentially introduce a land grab competition for pretty, short names:

module github.com/my/package
alias yaml // use as import "yaml"

This also allows for potentially hostile takeovers of module functionality via supply chain attacks. E.g.

In your code:

// you're sure this is code you've written
import "corp.example.com/authz"

in some deeply nested dependency that you didn't think to review:

module evil.example.com
alias corp.example.com/authz
// with code to bypass authz checks

If we were to allow module aliasing, I think it should only be allowed with a replace directive in the main module. (so #26904)

mateusz834 commented 1 day ago

Probably worth mentioning, that in case of an alias:

module k8s.io/api
alias k8s.dev/api

The behavior of a program might change slightly (because of import order (init))

pohly commented 1 day ago

@seankhliao: ack, how to do this securely is a problem. The situation that we are facing right now is that we are still in control of the old domain and want to prepare for the transition while we have it. We would start publishing identical content under both domains (or rather, forward to the same github.com/kubernetes/<repo>) and tell everyone that they must migrate their import statements before loosing the domain.

Once it's lost, import statements with k8s.io are no longer valid and cannot be resolved.

Perhaps that can be used to make this secure? The alias could trigger only if something imports k8s.io/api and k8s.io/dev and both are the same. That way a module can never shadow some other, unrelated module.

We may have to consider whether we want to use module k8s.io/api and k8s.dev/api as alias or the other way around. Probably it should be module k8s.dev/api with k8s.io/api as alias, because we will start doing that once we know that we want to switch. At that point, k8s.dev is the official domain.

My original thinking was that we could continue using k8s.io as official name even when not owning that domain anymore. But that's probably not possible unless Go permanently treats k8s.dev as a secure replacement.

@mateusz834: right, import order could be relevant. But I'm less worried about that.

rsc commented 1 day ago

@pohly

Kubernetes is currently facing that challenge because the .io domain might go away and all modules are called k8s.io/. It is not certain that anything needs to be done, but if something needs to be done, then it might take years to be ready - so let's discuss now.

According to https://www.icann.org/en/blogs/details/the-chagos-archipelago-and-the-io-domain-14-11-2024-en, if something needs to be done, we will still have at least 5 years advance notice from the point where we know something needs to be done. Specifically, that page concludes:

At this time, however, much of the discussion about .io is simply speculation. Should this change in the future, those changes will be well communicated. It is not a foregone conclusion that a change in sovereignty will result in a change to the .io domain, but if that result comes to pass, ICANN policy provides a great deal of time for the community to adapt to any changes.

We are already discussing how best to handle package migrations (which would suffice to handle module migrations too, since a module is just a set of packages). I don't believe we should design something separate for module migration, and we especially should not design something for a hypothetical that may not actually happen.

earthboundkid commented 1 day ago

Isn't this just #60696?

As someone who changed Github usernames, I would like it if #60696 were implemented. As it is, I'm just slowly rolling out v2 of all of my modules so that I can move things off of the old username to the new one.

pohly commented 1 day ago

If I understand it right, #60696 is about helping consumers of a package update their go.mod automatically during a dependency update with go get -u. It doesn't seem to support using both names at the same time in a binary, which is what would be needed here (a dependency imports k8s.io/klog/v2, binary imports k8s.dev/klog/v2 -> both must be the same package).