golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.4k stars 17.71k forks source link

cmd/go: precompile dependencies for docker image cache #45474

Open hugbubby opened 3 years ago

hugbubby commented 3 years ago

This issue may reveal some fundamental misunderstanding I have about golang/compilers, but here goes:

Some of the dependencies for a monolithic golang program on our backend are quite large. Several of them use code generation as a generics workaround and are large enough to force us to use internal gitlab runners because of OOM errors. Currently we use go mod download as a build step in containerizing it, so that downloading those dependencies is cached. However, as far as I can tell, there's no way to run the equivalent of go get for just a go.mod and go.sum file; we can download, but not compile these dependencies that mostly stay the same from build to build.

This means that every time we build our backend golang program we have to recompile all of the libraries that that program uses from scratch, instead of caching that as a build step and only recompiling when we change our go.mod/go.sum files.

I'd like a go command (if it doesn't exist already, and doesn't admit some fundamental misunderstanding of how libraries get compiled) to get much of this work done before we COPY our source code into the docker image. Maybe this would just be a modification of go get that ignored the fact that there was no source code in the working directory when a go.mod file exists specifying all of the libraries used for a package. Such a command would shave a lot of build time in our CI.

ianlancetaylor commented 3 years ago

CC @bcmills @jayconrod

seankhliao commented 3 years ago

Without the source code to list your actual dependencies (as opposed to everything that's possibly reachable from the module graph, including test dependencies of you dependencies etc...), you're going to massively overbuild. I think the closest you can do today might be the following, with the caveats that each module is built independently so shared dependencies at different versions will be built multiple times, and it also includes trimmed out dependencies.

for p in $(go list -m -f '{{ if not .Main }}{{ .Path }}/...@{{ .Version }}{{ end}}' all) ; do 
    go install  $p
done

You may have better luck using docker buildx to mount a cache in (caveat https://github.com/moby/buildkit/issues/1512)

#syntax=docker/dockerfile:1.2
FROM golang:alpine AS build
WORKDIR /workspace
COPY . .
RUN --mount=type=cache,id=gomod,target=/go/pkg/mod \
    --mount=type=cache,id=gobuild,target=/root/.cache/go-build \
    go build -o app .
jayconrod commented 3 years ago

I don't think any new feature in the go command is needed. The build cache can be populated with go build, go install, or maybe go list -export -test. You'll need a list of packages to cache or a package that transitively imports those packages.

I'm marking this as a documentation issue though since we could really use a guide on efficiently building container images.

hugbubby commented 3 years ago

Without the source code to list your actual dependencies (as opposed to everything that's possibly reachable from the module graph, including test dependencies of you dependencies etc...), you're going to massively overbuild. I think the closest you can do today might be the following, with the caveats that each module is built independently so shared dependencies at different versions will be built multiple times, and it also includes trimmed out dependencies.

for p in $(go list -m -f '{{ if not .Main }}{{ .Path }}/...@{{ .Version }}{{ end}}' all) ; do 
    go install  $p
done

You may have better luck using docker buildx to mount a cache in (caveat moby/buildkit#1512)

#syntax=docker/dockerfile:1.2
FROM golang:alpine AS build
WORKDIR /workspace
COPY . .
RUN --mount=type=cache,id=gomod,target=/go/pkg/mod \
    --mount=type=cache,id=gobuild,target=/root/.cache/go-build \
    go build -o app .

Had never heard of docker buildx. Unfortunately our CI uses kaniko and it doesn't support those extra flags ;-; https://github.com/GoogleContainerTools/kaniko/issues/969.

seankhliao commented 3 years ago

You could try a different approach with kaniko: specify a directory under /var/run as the build cache and mount it as a volume to be shared/reused between builds (could also do the same for module cache)

andig commented 3 years ago

@jayconrod I'd like the ability to create the list of modules on demand instead of a manual maintenance task. Seems there are at least some caveats with go:embed:

❯ go list -export -test
Alias tip: gol -export -test
# github.com/andig/evcc/internal/vehicle/cloud
internal/vehicle/cloud/cert.go:9:3: invalid go:embed: build system did not supply embed configuration
internal/vehicle/cloud/cert.go:12:3: invalid go:embed: build system did not supply embed configuration
internal/vehicle/cloud/cert.go:19:3: invalid go:embed: build system did not supply embed configuration
internal/vehicle/cloud/cert.go:12:12: pattern client-key.pem: no matching files found
github.com/andig/evcc

Would go list work when only the go.mod has been downloaded but not the full sources?

jayconrod commented 3 years ago

@andig There's no way for the go command to know which modules are needed to compile the packages in the main module without having the sources for those packages. go list -m all prints all the modules in the build list, but that's the same set downloaded by go mod download all, which is usually too much.

For go list -export, that actually does compile packages into the cache, so you'll need all the files present for that. From the error message, it sounds like something else might be going on though specific to go list.

Seb-C commented 3 years ago

I believe that this is the same issue as #27719, which is still not solved. It's a major pain that I meet on every project using golang and docker nowadays.

Seb-C commented 3 years ago

Without the source code to list your actual dependencies (as opposed to everything that's possibly reachable from the module graph, including test dependencies of you dependencies etc...), you're going to massively overbuild.

A command to do that would be a perfectly fine compromise to me, way better than the current situation anyway.

The dependencies rarely changes or needs to be rebuilt (at most a few times per month anyway), while the actual source code of the application needs to be rebuilt multiple times per minute when developing. Taking a longer time than necessary to rebuild the dependencies is pretty much a non-issue, while not being able to build both independently is a major problem.