golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.38k stars 17.71k forks source link

proxy.golang.org: stop serving /@v/branchname.info when the branch is deleted #49146

Open mvdan opened 3 years ago

mvdan commented 3 years ago

I just had a very confusing experience using modules for a solid hour, which was all caused because one of the tools I use renamed their HEAD branch from master to main and I hadn't noticed.

I was happily running go install github.com/kyleconroy/sqlc/cmd/sqlc@master, and it was working. I ran into some weird behaviors which should have been fixed in the latest development version, which was odd. I later found out via go version -m $(which sqlc) that my version was half a year old, even though HEAD had a commit just days ago.

Ah! The HEAD branch is now main. But why is @master still happily giving me an old version, when upstream deleted their branch? Surely, upstream deleting their old HEAD branch is as much as they should do to make things right.

Turns out that the problem is that proxy.golang.org is remembering the last master version it saw, presumably forever:

$ go list -m github.com/kyleconroy/sqlc@master
github.com/kyleconroy/sqlc v1.8.1-0.20210503000056-ec35f225dfed
$ go list -m github.com/kyleconroy/sqlc@main
github.com/kyleconroy/sqlc v1.10.1-0.20211019050806-6ee39cb62526
$ GOPROXY=direct go list -m github.com/kyleconroy/sqlc@master
go: github.com/kyleconroy/sqlc@master: invalid version: unknown revision master
$ GOPROXY=direct go list -m github.com/kyleconroy/sqlc@main
github.com/kyleconroy/sqlc v1.10.1-0.20211019050806-6ee39cb62526

You can see it frozen in time at https://proxy.golang.org/github.com/kyleconroy/sqlc/@v/master.info.

I strongly believe the proxy server should not do this. I realise that tagged versions are effectively treated as read-only once pushed by the proxy and index servers, but I understand that branches aren't meant to work that way. If a branch vanishes from upstream, I think its .info file should reflect that, just like /@v/missing.info gives invalid version: unknown revision missing.

This issue is slightly similar to https://github.com/golang/go/issues/39007, but also different - in my case, the module is healthy, but just not at that branch anymore.

mvdan commented 3 years ago

cc @katiehockman @heschi @hyangah

aarzilli commented 3 years ago

Isn't the whole point of the proxy to keep serving things that have been deleted?

mvdan commented 3 years ago

For tags and semver versions, I definitely agree. For branch names, I'm not that convinced. A branch is a moving target by definition.

An alternative would be to build special knowledge that, if master has vanished but main still exists, then /@v/master.info should redirect to /@v/main.info. I believe sites like proxy.golang.org and pkg.go.dev already treat master/main branches in a special way, so perhaps this extra rule would be reasonable. Intuitively, I think I'd prefer a fix that doesn't depend on branch names.

hyangah commented 3 years ago

My personal preference (not go proxy or go command teams' opinion) is not to proxy or cache any version query string (e.g. tags, commit hash, branch name), and let proxy.golang.org just return 404/410 with a long TTL for such queries. I'd also like the go command to just automatically switch to direct mode (if GOPROXY includes it) for tags/branch names - my reasoning is that they are VCS-specific and it's better to let VCS handle. But tags such as `vX.Y.Z' which look like a version string that looks like go's recognized semver, but may be resolved to something else.

aarzilli commented 3 years ago

For tags and semver versions, I definitely agree. For branch names, I'm not that convinced. A branch is a moving target by definition.

This argument could be made if the repository is deleted as well. If the entire repository was deleted should sqlc@master return anything? I think @hyangah argument is more consistent.

bcmills commented 3 years ago

I'd also like the go command to just automatically switch to direct mode (if GOPROXY includes it) for tags/branch names

I don't think the go command should automatically switch to direct mode for branch names or non-semver tags — the default proxy behavior is designed so that the proxy can explicitly reject, say, module paths that are known not to comply with the proxy owner's third-party-dependency policies.

(But I do agree that it would be fine for proxy.golang.org in particular to serve 404s or 410s for branches and non-semver tags, since it has no such policy to enforce.)

bcmills commented 3 years ago

It may also be possible to change cmd/go to more clearly indicate the situation when a module's repository exists but the requested version (or nested module) is not present in that repository. The proxy could use that as an explicit signal to evict the cache entry.

mvdan commented 1 year ago

This argument could be made if the repository is deleted as well.

Coming to this over two years later, but: the proxy should continue serving semver versions practically forever, unless there's something serious like a takedown notice or licensing issue. Otherwise, one pillar of reproducible builds falls down. I still think that branch names do not fall under that category, because they're not used by MVS nor commands like go build on a tidied module, as those don't need to resolve branches. It's only direct calls by humans or scripts like go install pkg@master which are affected.

I personally don't mind whether this detection of "branch deleted" happens in the proxy or in cmd/go. Intuitively, I would assume that the proxy can do it more efficiently, as it already needs to keep track of new versions published by the module, as well as cache some branch names. On the other hand, an implementation in cmd/go means that this would be solved for any proxy and not just proxy.golang.org.