Open matloob opened 2 weeks ago
How often do we think the cache would be used for a binary? Pretty much any change to the code of any of the inputs will require a relink. When would this save time in practice?
I think the main use case is go run package@version
which will always produce the same binary.
Could we consider a total space ejection policy rather than a time based one? (drop oldest from cache when we try to add something to cache that pushes it over the limit)
Yes, I think the main use would be for go run package@version
. But I think it could also be useful for go run package
. This can be useful for tools used by projects such as those in tools.go
and it would also be useful for the new tool
feature being implemented for go 1.24 (#48429).
Thanks. That at least raises the possibility of using the build cache for go run
but not for go build
or go install
.
One thing I'd like for this is if the package name showed up in the output of ps
.
I think that means that we should store binaries like stringer on disk in a directory: $GOCACHE/exe/<ha>/<hash>/stringer
, though it may also be OK to do something like $GOCACHE/exe/<hash>-stringer
if there's a lot of overhead per directory; so that if a tool is misbehaving I can ps ax | grep stringer
to find it.
Change https://go.dev/cl/613095 mentions this issue: cmd/go: prototype of binary caching
I've put together a basic prototype (caches all binaries, doesn't use package name or ExeName as file name) at golang.org/cl/613095
Does this need a proposal? It seems to be a mere implementation detail that shouldn't affect the observable behavior, except quantitatively.
(I suppose @ConradIrwin's point that "go run" processes might notice their argv[0] name has changed is a counterargument but @matloob has an implementation that avoids that by using a subdirectory <hash>/stringer
.)
ETXTBSY might not be an issue on Linux for too long: https://github.com/golang/go/issues/22315#issuecomment-2351745852
There are already people who complain about the size of the build cache (e.g., #68872), so I do think this is more than an implementation detail.
Proposal Details
This proposal is for cmd/go to cache the binary outputs of link actions in builds to the build cache. Binary outputs would be trimmed from the cache earlier than package outputs.
cmd/go currently caches the outputs of compiling a package (
build
actions) in the build cache, but does not cache the outputs of linking a binary (link
actions) in the build cache: https://github.com/golang/go/blob/2707d42966f8985a6663c93e943b9a44b9399fca/src/cmd/go/internal/work/buildid.go#L713The primary reasons binaries are not cached are that built binaries are much larger than individual package object files and they are not reused as often. We would mitigate that by trimming binaries with a shorter/ they would be stored in $GOCACHE/exe//.
trimLimit
than we currently use for the package objects in the cache: we currently remove a package output if it hasn't been used for five days, but we would perhaps choose two days for binaries. To make it easy to identify binaries for trimming, we would store them in a different location than package objects: perhaps instead of $GOCACHE/We would also need to figure out what to do about the potential for ETXTBSY issues trying to execute the built binaries: see #22220. If the go command tries to write to a binary and then execute it we can get errors executing the binary. We'll have to figure out what to do about this because we would need to write the build id into the binary and then execute it, if we're doing a
go run
.cc @rsc @samthanawalla