bazelbuild / rules_go

Go rules for Bazel
Apache License 2.0
1.35k stars 635 forks source link

go_sdk.host() leads to unstable unreproducible builds #3967

Closed vorburger closed 1 week ago

vorburger commented 1 week ago

What version of rules_go are you using?

I was using 0.46.0 when this problem occurred. I have meanwhile switched to 0.48.1, but I have no reason to believe anything specifically related to this changed in the meantime.

What version of gazelle are you using?

I was using 0.36.0 when this problem occurred. I have meanwhile switched to 0.37.0, but I have no reason to believe anything specifically related to this changed in the meantime.

What version of Bazel are you using?

7.1.1

Does this issue reproduce with the latest releases of all the above?

I suspect so, but I'm not 100% sure, because I have no reason to believe anything specifically related to this problem changed in the meantime, but I gave up and have worked around this problem with another approach before trying to reliably reproduce it with the very very latest versions of rules_go & gazelle & Bazel itself. Based on my understanding of the problem, I doubt it just got fixed.

What operating system and processor architecture are you using?

Debian Linux (derivation) with apt package manager on amd64 = x86_64.

Any other potentially useful information about your toolchain?

Not relevant.

What did you do?

I have used go_sdk.host(), as documented on https://github.com/bazelbuild/rules_go/blob/master/docs/go/core/bzlmod.md, and my build would regularly break, see https://github.com/enola-dev/enola/issues/626, both on GitHub Action CI and locally on Debian & Fedora Linux distros, due to:

ERROR: /home/runner/work/enola/enola/core/lib/BUILD:131:21: While resolving toolchains for target //core/lib:core_proto_doc (c4aba0f): invalid registered toolchain '@go_toolchains//:all': while parsing '@go_toolchains//:all': no such package '@@rules_go~~go_sdk~main___host_0//': Inconsistent filesystem operations. /opt/hostedtoolcache/go/1.21.8/x64/pkg/tool is no longer an existing directory. Did you delete it during the build? The results of the build are not guaranteed to be correct. You should probably run 'bazel clean' and investigate the filesystem inconsistency (likely due to filesystem updates concurrent with the build)

The "during the build" & "likely due to filesystem updates concurrent with the build" parts of the error message are somewhat misleading; the root cause of this problem is simply that go_sdk.host() records /opt/hostedtoolcache/go/1.21.8/x64/pkg/tool somewhere, but that is not "stable", because an OS package manager (à la apt or dnf) regularly updates that "host" go version - and the paths change... e.g. from /opt/hostedtoolcache/go/1.21.8/x64/pkg/tool or /usr/lib/go-1.21/pkg/tool or whatever to a newer release.

What did you expect to see?

{ Fast, Correct } — Choose two 😈 LOL

What did you see instead?

My builds regularly break "just by themselves" when I'm using go_sdk.host().

vorburger commented 1 week ago

I'm not sure how "you" (the bazelbuild/rules_go) could even fix this - but it still seemed worth recording an issue about.

If you think that Bazel (core) could provide better error handling with clearer messages for this case, an upstream issue?

Perhaps ideally go Debian & Fedora packages should use update-alternatives and provide you (this rule) a more "fixed" go installation directory?

Or maybe using go_sdk.host() simply really isn't such a great idea... I've just raised #3968 to propose discouraging adopters from using it... if you think that's all that you can and want to do about this problem (and presumably don't want to entirely remove go_sdk.host()) then just close this issue as resolved by (that docs only) PR.

@teivah FYI

fmeum commented 1 week ago

With an SDK as standardized as Go, I don't see a reason to ever use go_sdk.host() outside of super simple toy projects. I like the paragraph you added to the docs, let's go with that.

vorburger commented 1 week ago

@fmeum actually, I just noticed that, at least on Debian, and probably on at least some other Linux distros, /usr/lib/go is a symlink to (e.g.) /usr/lib/go-1.21/pkg/tool ... perhaps rules_go (or Bazel itself? I'm not sure who does what) could be "smarter" when it "finds" the "Go home" for go_sdk.host(), and use the path without the version number?

Just a thought; if you just want to resolve this after the merge of #3968, then no objections, of course.

fmeum commented 1 week ago

That's an interesting find. I'm open to improving that situation, but changing the way symlinks are handled sounds tricky. Happy to review PRs, but overall just using go_sdk.download() is probably better.

sluongng commented 1 week ago

In the past, we have seen advanced orgs with custom patches on top of Go SDK leveraging the host SDK to use their custom fork.

Since then, Go SDK is no longer shipped with precompiled artifacts and rules_go has started to provide mechanisms to (a) compile the SDK from source and (b) enable patching the downloaded SDK. Because of this, folks with host SDK usage should strongly consider using a downloaded sdk instead.

You could either host your entire patched sdk in a custom remote URL and tell rules_go to fetch it from there, or use the upstream SDK with custom patches that you managed in the source tree applied on top, or both.

So I would even go as far as adding some warning messages to go_sdk.host() to let folks know about alternative routes.