bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.06k stars 4.04k forks source link

Protobuf repo recompilation sensitivity #7095

Open pcj opened 5 years ago

pcj commented 5 years ago

As discussed on the mailing list https://groups.google.com/forum/#!topic/bazel-discuss/UKLK7Y-jsW4 the protobuf repo is notoriously sensitive to recompilation. Presumably something in the environment used in the action key for protobuf C++ compilation leads to cache invalidation and hence recompiling protobuf from source.

Given protobuf libs are often base dependencies for many others, leads to degraded user experience.

Repro steps include taking any workspace that has a proto dependency and source ~/.bashrc, watch protobuf recompile.

Observed up to and including bazel 0.20.0.

/assign @hlopko

hlopko commented 5 years ago

Can you double check that:

git clone https://github.com/bazelbuild/rules_cc
cd rules_cc
bazel test //tools/migration/...
source ~/.bashrc
bazel test //tools/migration/...

results in recompilation? What is the minimal content of the ~/.bashrc file that triggers the issue? What platform are you on?

pcj commented 5 years ago

Looks like it's the PATH. Any changes to PATH will trigger recompilation.

Perhaps this is working as intended, or perhaps we can split it and use a set to dedup entries to normalize the path to the minimal necessary set.

Is it possible to exclude PATH here and rely solely on the toolchain?

pcj commented 5 years ago
$ uname -a
Linux bina 4.13.0-46-generic #51-Ubuntu SMP Tue Jun 12 12:36:29 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
hlopko commented 5 years ago

Hmmm then https://github.com/bazelbuild/bazel/issues/6648 should actually help you (afaik). Can you try with bazel 0.21?

hlopko commented 5 years ago

or with 0.20 and with --incompatible_strict_action_env

pcj commented 5 years ago

Yes! Tested on bazel 0.21.0 and indeed no longer recompiles with PATH change. I guess I will go ahead and close this. The migration path from 0.19.2 to 0.20.0 and 0.21.0 is a little challenging (for this same reason), but clearly beneficial. Thanks for your help @hlopko.

hlopko commented 5 years ago

Awesome, thanks Paul. Can you pls share this solution on the mailing list?

hlopko commented 5 years ago

@pcj What toolchain do you use? Was it the Bazel's autoconfigured C++ toolchain?

hlopko commented 5 years ago

Also, on what platforms do you observe this?

pcj commented 5 years ago

@hlopko I had observed it on linux & darwin before bazel 0.21.0. Now that I'm on 0.21 I'm really not seeing this anymore.

For most projects I'm not using a special toolchain, so yes, the autoconfigured one.

buchgr commented 5 years ago

I believe that I found the root cause. It looks like this issue is not specific to C++, but that almost all actions (spawn actions, c++ compile, c++ link, java compile) in Bazel declare a dependency on PATH independent of whether these actions actually use them. It's probably just most obvious with c++ actions as they typically take quite long.

keith commented 5 years ago

That would make sense with our solution to this which was to launch bazel with env -i and only pass very limited variables (and a static PATH variable)

meisterT commented 5 years ago

@buchgr friendly ping

buchgr commented 5 years ago

@meisterT what's up? :)

meisterT commented 5 years ago

I have heard from bazel users that this is still a pain point. So I wanted to check in if you are working on a fix and when we can expect it to land?

buchgr commented 5 years ago

Yes I am working on a proposal to fix it. I don't want to make promises on when it will land. 1-2 quarters.

sudoforge commented 4 years ago

I currently suffer from the protobuf targets rebuilding (a dependency of bazelbuild/buildtools (specifically, buildifier), but cannot reproduce the issue by re-sourcing my shell's rc file. To be honest, I'm really not sure how to debug this further other than "it happens every 5-10 minutes of inactivity".

Anduril-Xu commented 4 years ago

Its been annoying for a while.

In my setting, I have several languages under the same workspace, python, java, javascript/typescript. I use Intellij for most of the development, but every time I build one of the targets in one of the languages, the bazel cache seems to be invalidated for other languages, thus triggering a rebuild of some packages, most notably protobuf. It is only 10-20 seconds each time, but very annoying when I do cross language development.

Is this expected? anyway to not rebuild protobuf?

EDIT: my bazel version was 0.29.1. the latest 3.2.0 also does not solve the problem

EDIT again: seems like using prebuilt binaries works, e.g. https://github.com/google/startup-os/blob/master/WORKSPACE#L142-L154

franciscocpg commented 3 years ago

Its been annoying for a while.

In my setting, I have several languages under the same workspace, python, java, javascript/typescript. I use Intellij for most of the development, but every time I build one of the targets in one of the languages, the bazel cache seems to be invalidated for other languages, thus triggering a rebuild of some packages, most notably protobuf. It is only 10-20 seconds each time, but very annoying when I do cross language development.

Is this expected? anyway to not rebuild protobuf?

EDIT: my bazel version was 0.29.1. the latest 3.2.0 also does not solve the problem

EDIT again: seems like using prebuilt binaries works, e.g. https://github.com/google/startup-os/blob/master/WORKSPACE#L142-L154

Hi @Anduril-Xu Were you able to configure your workspace to use prebuilt binaries? I have a go monorepo and haven't had success yet :disappointed:

sudoforge commented 3 years ago

@franciscocpg Are you still experiencing this? Have you tried using --incompatible_strict_action_env?

Failing that, or if you simply desire to use prebuilt binaries directly without future debugging, you should be able to simply load the binaries via http_file. Have you tried following the example that was linked to in google/startup-os?

franciscocpg commented 3 years ago

Hi @sudoforge

I want to use prebuilt binaries and have asked about it here: https://github.com/bazelbuild/rules_go/issues/2684.

But I haven't tried http_file yet. Will give it a try.

c-mita commented 3 years ago

@Anduril-Xu - I tried replicating recompilation of protobuf when changing language targets. The only issue I saw seemed to have been caused by the relatively recent change (#12430) that was made after your comment. (And my merged PR on the protobuf repo addresses it (https://github.com/protocolbuffers/protobuf/pull/8626)).

Is this still an issue?

stevekuznetsov commented 2 years ago

This is definitely still an issue - to be honest in projects that use Bazel I've never actually seen it "fixed" - I just got used to the fact that protobuf recompiles all the time. Hit this just today in cockroachdb/cockroach with slight changes to unrelated code, recompiled protobuf a half dozen times already.

$ bazel version
Bazelisk version: v1.11.0
Build label: 5.1.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 24 14:02:30 2022 (1648130550)
Build timestamp: 1648130550
Build timestamp as int: 1648130550
stevekuznetsov commented 2 years ago

Seems like running gazelle can invalidate and cause a recompile for this package?

meisterT commented 2 years ago

@stevekuznetsov Can you write up repro instructions for cockroachdb w. gazelle please?

stevekuznetsov commented 2 years ago

@meisterT pinning down a direct reproducer is tricky. Cross-linked to an issue on the Cockroach project where devs talk about other targets getting recompiled, with some theories about why.

aaliddell commented 2 years ago

I've tried to trace this cache fragility before and thought it might be https://github.com/protocolbuffers/protobuf/issues/6886. However, IIRC that didn't solve it. When I've triggered this cache issue before with execution logging turned on, the action hashes between builds were identical but it was still rebuilding, suggesting it is not exclusively an env related issue.

One consideration may be that protobuf is treated somewhat special by Bazel, in that there are native flags and rules that reference it. As such, this may be why protobuf behaves unlike any other external repo with regard to caching. Since the execution logs suggest there should be no rebuild, is there a way of tracing deeper within Bazel as to why it made a decision to rebuild?

pcj commented 2 years ago

Presumably this would all go away if the proto rules were starlarkified as has been discussed for several years. Any update from the bazel team on this?

meisterT commented 2 years ago

cc @comius

arran-fathom commented 2 years ago

Any chance this can be bumped to a P1 please? It affects multiple people and has a direct impact on workflow because all builds and tests for those people are significantly slowed.

In my case I think it may also be the root cause of Go IntelliSense bindings slowing down because they rely on @io_bazel_rules_go//go/tools/gopackagesdriver to source proto-generated files. This grinds my workflow to a halt.

dt commented 1 year ago

I spent a little time this morning looking into why I was seeing very frequent protoc recompilations in cockroachdb/cockroach development and I believe that in my case might also be down to PATH changes as described above, but what was surprising to me is that this is a result of using the rules_go recommended editor configuration for my editor (VS Code).

My editor is configured using the recommended configuration, which includes both a) pointing gopls at bazel and gopackagesdriver and b) setting goroot to ${workspaceFolder}/bazel-${workspaceFolderBasename}/external/go_sdk.

Configuring the goroot seems more correct than letting vscode go invoke system go, so that when it runs go mod tidy or whatever, it is using the exact same version and installation of the sdk as bazel is when building. It appears to do so by prepending the configured path to PATH when invoking all tools. In isolation, this seems good/correct.

However this means that when gopls is invoking gopackagesdriver and thus bazel, it is doing so with this modified PATH variable. Thus the bazel invocations in my workspace -- both my manual, direct runs and the background, gopls executions -- are constantly switching back and forth, between my usual PATH and the modified PATH used by VSCode Go to invoke tools due to the overridden Go SDK.

Simulating this by hand, by running PATH=_bazel/path/to/sdk:$PATH bazel build some/tiny/pkg followed by a plain bazel build some/tiny/pkg and then back again causes the same observed effect, of invalidating protoc.

I guess there's not really new information here, insofar as protoc's sensitivity to PATH changes is already covered above, but what wasn't obvious to me at least when I started poking at this was that my PATH was changing so often, unbeknownst to me, just as a result of using the recommended editor configuration.

sudoforge commented 1 year ago

it wasn't obvious to me going in here that my PATH was changing so often as a result of using the recommended configuration.

yeah, this is really a bit of hidden knowledge that you learn through trial like you, but due to the sensitivity of PATH changes here, it's best to always execute bazel in the same environment. i've come to the conclusion that this means either focusing on

A) only executing bazel in the same environment locally, e.g. always via VS Code, or always via your shell; or B) setting up RBE and always relying on that

dt commented 1 year ago

only executing bazel in the same environment locally, e.g. always via VS Code, or always via your shell

For what it's worth, the executions "via VS Code" here aren't voluntary ie me electing to trigger a build in my editor versus from my terminal -- they are the result of using the recommended editor configuration which changes its code navigation/inspection (via gopls) to be backed by bazel and gopackagedriver (which seems good!). These invocations happen "out of sight" and in the background so it wasn't obvious to me, at first, that they were what was thrashing PATH.

pdeva commented 1 year ago

i dont thonk the vscode config is the rooti ssue. we use vscode too, but point it to manually installed Go. so no changes to PATH or anything. yet merely afterbrunning gazelle to update a minor unrelated dependency rebuilds the entire protobuf stuff and thus the entire rest of the repo for some reason.

fmeum commented 1 year ago

@pdeva If you can minimize your repo to a reproducer you can share, I can investigate the rebuilds.

dt commented 1 year ago

Just one last update from my digging, just in case it can save anyone else some time if they're in a similar situation:

I dug a bit more and realized that in our repo (cockroachdb/cockorach), we had developed particular sensitivity to PATH changes earlier this year when a developer added action_env=PATH to our dev config. It looks like this was motivated by new arm-based Macs having a different default location for homebrow-installed tools, including those needed by foreign_cc, now under /opt instead of /usr/local, only the latter of which is in the hardcoded default used by foreign_cc. Using action_env=PATH to let the user's existing configuration of platform-specific paths probably seemed correct/preferred at the time and in isolation, given we didn't realize tools would be invoking bazel with customized PATHs.

I have since changed our common dev configuration back to using a hardcoded PATH, based the default from foriegn_cc just with the addition of the new /opt homebrew location, and have not since heard additional reports of "spurious" protoc rebuilds 🤞.

Again, this one is probably pretty obvious in hindsight -- having a tool config that constantly changes PATH and a build configuration that makes compilation depend on PATH implies constant recompilation -- but in isolation each of the steps involved to arrive at that situation seemed innocuous enough at the time. I just wanted to spell it out on the off chance it saves anyone else a little debugging time.

nioan-aux commented 1 day ago

I'm wondering if there are any updates on this. It's affecting our build as well.

Would it be possible to implement a work around to expose a flag that allows for a pre-build version of the grpc compiler to be used instead of rebuilding it? Pantsbuild is using this approach.

This has been open for 5+ years, and it seems quite tricky to reproduce and find the root cause for. A workaround might be a good interim solution