golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.97k stars 17.53k forks source link

proposal: cmd/go: cache link output binaries in the build cache #69290

Open matloob opened 2 weeks ago

matloob commented 2 weeks ago

Proposal Details

This proposal is for cmd/go to cache the binary outputs of link actions in builds to the build cache. Binary outputs would be trimmed from the cache earlier than package outputs.

cmd/go currently caches the outputs of compiling a package (build actions) in the build cache, but does not cache the outputs of linking a binary (link actions) in the build cache: https://github.com/golang/go/blob/2707d42966f8985a6663c93e943b9a44b9399fca/src/cmd/go/internal/work/buildid.go#L713

The primary reasons binaries are not cached are that built binaries are much larger than individual package object files and they are not reused as often. We would mitigate that by trimming binaries with a shorter trimLimit than we currently use for the package objects in the cache: we currently remove a package output if it hasn't been used for five days, but we would perhaps choose two days for binaries. To make it easy to identify binaries for trimming, we would store them in a different location than package objects: perhaps instead of $GOCACHE// they would be stored in $GOCACHE/exe//.

We would also need to figure out what to do about the potential for ETXTBSY issues trying to execute the built binaries: see #22220. If the go command tries to write to a binary and then execute it we can get errors executing the binary. We'll have to figure out what to do about this because we would need to write the build id into the binary and then execute it, if we're doing a go run.

cc @rsc @samthanawalla

ianlancetaylor commented 2 weeks ago

How often do we think the cache would be used for a binary? Pretty much any change to the code of any of the inputs will require a relink. When would this save time in practice?

ianthehat commented 2 weeks ago

I think the main use case is go run package@version which will always produce the same binary.

Could we consider a total space ejection policy rather than a time based one? (drop oldest from cache when we try to add something to cache that pushes it over the limit)

matloob commented 1 week ago

Yes, I think the main use would be for go run package@version. But I think it could also be useful for go run package. This can be useful for tools used by projects such as those in tools.go and it would also be useful for the new tool feature being implemented for go 1.24 (#48429).

ianlancetaylor commented 1 week ago

Thanks. That at least raises the possibility of using the build cache for go run but not for go build or go install.

ConradIrwin commented 1 week ago

One thing I'd like for this is if the package name showed up in the output of ps.

I think that means that we should store binaries like stringer on disk in a directory: $GOCACHE/exe/<ha>/<hash>/stringer, though it may also be OK to do something like $GOCACHE/exe/<hash>-stringer if there's a lot of overhead per directory; so that if a tool is misbehaving I can ps ax | grep stringer to find it.

gopherbot commented 6 days ago

Change https://go.dev/cl/613095 mentions this issue: cmd/go: prototype of binary caching

matloob commented 6 days ago

I've put together a basic prototype (caches all binaries, doesn't use package name or ExeName as file name) at golang.org/cl/613095

adonovan commented 14 hours ago

Does this need a proposal? It seems to be a mere implementation detail that shouldn't affect the observable behavior, except quantitatively.

(I suppose @ConradIrwin's point that "go run" processes might notice their argv[0] name has changed is a counterargument but @matloob has an implementation that avoids that by using a subdirectory <hash>/stringer.)

mvdan commented 14 hours ago

ETXTBSY might not be an issue on Linux for too long: https://github.com/golang/go/issues/22315#issuecomment-2351745852

ianlancetaylor commented 11 hours ago

There are already people who complain about the size of the build cache (e.g., #68872), so I do think this is more than an implementation detail.