containerbuildsystem / cachi2

GNU General Public License v3.0
7 stars 25 forks source link

Go does not download toolchains when the cachi2 Go version is newer than the specified toolchain #501

Closed taylormadore closed 3 months ago

taylormadore commented 5 months ago

Cachi2 does not provide a toolchain in GOMODCACHE when the installed version of Go in the cachi2 image is the same or newer than specified by the toolchain directive in go.mod. This behavior of Go is noted in the documentation for Go toolchains [1]:

the go command uses its own bundled toolchain when that toolchain is at least as new as the go or toolchain lines in the main module or workspace.

This may cause an issue when performing a hermetic build from the cachi2-output where the GOTOOLCHAIN environment variable is set to auto. If the builder image does not have a version of Go matching or exceeding that specified by the toolchain directive, the build may fail when trying to fetch the toolchain.

References

[1] https://go.dev/doc/toolchain [2] https://github.com/containerbuildsystem/cachi2/discussions/419 (Go 1.21 design doc)

eskultety commented 5 months ago

Hmm, I wonder if we can ultimately provide a generic fix that would cover all user scenarios as consumers must not blindly assume we'll deal with everything, they need to cooperate by selecting an image that on its own can attain what they want, i.e. building the project. Consider the following scenarios and go.mod contents:

  1. 
    # go.mod
    go 1.21
    toolchain 1.21.5

project base image

$ go version go version go1.21.4 linux/amd64

cachi2

$ go version go version go1.21.6 linux/amd64

Because cachi2 has a newer toolchain, _1.21.5_ won't be downloaded, we set `GOTOOLCHAIN=auto` and the project build will fail because the builder image will try installing the newer toolchain which we didn't prefetch. In this case it may be reasonable for us to compare the versions and fallback to `GOTOOLCHAIN=local` and pass handling of the build problems, e.g. stemming from potential version mismatch in the builder image to the user. Since `toolchain` isn't enforced it seems like any _1.21_ toolchain would successfully build the project with `GOTOOLCHAIN=local` and hence this would be a reasonable tradeoff for us.

2. 

go.mod

go 1.21.5

project base image

$ go version go version go1.21.4 linux/amd64

cachi2

$ go version go version go1.21.6 linux/amd64


This is a user problem, because they strictly mandate go _1.21.5_ , yet the base image they build with doesn't even comply with cachi2 out of the picture (note that cachi2 in this case has a good argument of using whatever the latest Go version is). This would fail either way (unless we employ one of the suggestions below), so I wonder whether it's okay for us to do nothing and let it fail with a more confusing error about not being able to download the toolchain because of `GOTOOLCHAIN=auto` or again set it to `GOTOOLCHAIN=local` and make it a user error, but likely with a more descriptive error.

Despite the above, there are options for us to go more than one step further and basically walk the consumers by the hand through the whole process taking care of their own problems:

### change our base image and be in full control of the Go version in it
This is already being discussed [here](https://github.com/containerbuildsystem/cachi2/discussions/401), so a TL;DR is that we'd install the first minor release of all supported Go SDKs (e.g. _go1.21.0_) and always set `GOTOOLCHAIN=auto` for container-based environments since any _1.21_ toolchain would meet the criteria and hence we'd prefetch anything newer. The downside to this (aside from maintenance and Containerfile changes) is that we'd potentially be using vulnerable code for the prefetch as I really don't know what toolchain is used for `go mod download` with `GOTOOLCHAIN=auto` as this is quite opaque. If **all** operations are performed by the new toolchain, then the security aspect essentially becomes a non-issue (or at least not that pressing), but if it's only used for actually building the project, but the old toolchain is used for module fetching, then it may become an aspect to consider.
That all said, keep in mind that ^this would only handle container-based environments and for local host deployments we'd still need do it differently, e.g. that variable handling mentioned earlier.

### download any toolchain explicitly
We do have a code foundation to download toolchains already, but we decided to only ever fall back to using that code path with local environments rather than container based environments. The arguments against using this solution in container-based envs are:
- downloading whole Go SDKs poses a non-negligible performance penalty
- we're not in charge of all environment variables exposed in the container so one of the default locations either `go download` or `go install` uses might not be actually writable to the default user
- SDKs downloaded via `go download` are different from the behaviour of `GOTOOLCHAIN=auto` in that the SDKs are not stored in `GOMODCACHE` as the latter would do so we'd still need to figure out how to get it there manually
**While definitely an option I'd personally really like to avoid this one if possible.**
eskultety commented 5 months ago
  1. ... In this case it may be reasonable for us to compare the versions and fallback to GOTOOLCHAIN=local and pass handling of the build problems, e.g. stemming from potential version mismatch in the builder image to the user. Since toolchain isn't enforced it seems like any 1.21 toolchain would successfully build the project with GOTOOLCHAIN=local and hence this would be a reasonable tradeoff for us.

In retrospect, ^this won't work with submodules or workspaces if each submodule requires a different version of Go by the means of the go line, because we'd only ever set GOTOOLCHAIN once (if at all).

So I think as discussed via private channels the only reasonable fix for this is to always use GOTOOLCHAIN=auto for fetching only (using 1.2X.0) as the base toolchain for the job to make sure we always prefetch whatever toolchain a project/module requires.

At the same time, we'd stop forcing any GOTOOLCHAIN setting onto the users' container build process by the means of env variables generation and let them either define it explicitly or let them used whatever the default is in the respective base image (now that we'd use GOTOOLCHAIN=auto on our own for the prefetch) they wish to use (will update the design with this remark).

@taylormadore any objections to the paragraph above^?

taylormadore commented 5 months ago

any objections to the paragraph above^?

Nope, looks good to me :+1:. Thank you for all of your efforts on this feature :heart:

eskultety commented 3 months ago

Resolved with https://github.com/containerbuildsystem/cachi2/commit/e0732c78f7df24de2b2a6af0750e40fa2b6396a1