bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.01k stars 4.03k forks source link

aquery and --build_tests_only do not work together in Bazel 5.0 #14505

Closed philsc closed 2 years ago

philsc commented 2 years ago

Description of the problem / feature request:

I'm debugging some strange dynamic linker errors in our CI tests related to gtest and absl when I switch from 4.2.2 to 5.0.0rc3. This bug report is based on some of the investigation I've done. I suspect that this issue is related, but I'm not sure.

With this setup:

# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
  name = "com_google_googletest",
  urls = ["https://github.com/google/googletest/archive/609281088cfefc76f9d0ce82e1ff6c30cc3591e5.zip"],
  strip_prefix = "googletest-609281088cfefc76f9d0ce82e1ff6c30cc3591e5",
)

and an empty top-level BUILD file, I get the following with 4.2.2:

$ bazel --nohome_rc aquery --test_summary=terse --test_output=errors --build_tests_only --test_timeout=30,120,300,900 -- @com_google_googletest//:all | grep 'Compiling.*gtest-death-test'
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/home/pschrader/.bazelrc
Starting local Bazel server and connecting to it...
Loading: 
Loading: 0 packages loaded
INFO: SHA256 (https://github.com/google/googletest/archive/609281088cfefc76f9d0ce82e1ff6c30cc3591e5.zip) = 5cf189eb6847b4f8fc603a3ffff3b0771c08eec7dd4bd961bfd45477dd13eb73
DEBUG: Rule 'com_google_googletest' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "5cf189eb6847b4f8fc603a3ffff3b0771c08eec7dd4bd961bfd45477dd13eb73"
DEBUG: Repository com_google_googletest instantiated at:
  /home/pschrader/work/test_repo/WORKSPACE:3:13: in <toplevel>
Repository rule http_archive defined at:
  /home/pschrader/.cache/bazel/_bazel_pschrader/a2b0977d7e80e55a29a4e671bd75a258/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in <toplevel>
Analyzing: 3 targets (1 packages loaded)
Analyzing: 3 targets (1 packages loaded, 0 targets configured)
Analyzing: 3 targets (23 packages loaded, 222 targets configured)
Analyzing: 3 targets (24 packages loaded, 249 targets configured)
INFO: Analyzed 3 targets (25 packages loaded, 389 targets configured).
INFO: Found 3 targets...
action 'Compiling googletest/src/gtest-death-test.cc'
INFO: Elapsed time: 6.658s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
INFO: Build completed successfully, 0 total actions

but with 5.0.0rc3 I get this:

$ bazel --nohome_rc aquery --test_summary=terse --test_output=errors --build_tests_only --test_timeout=30,120,300,900 -- @com_google_googletest//:all | grep 'Compiling.*gtest-death-test'
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/home/pschrader/.bazelrc
Starting local Bazel server and connecting to it...
Loading: 
Loading: 0 packages loaded
DEBUG: Rule 'com_google_googletest' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "5cf189eb6847b4f8fc603a3ffff3b0771c08eec7dd4bd961bfd45477dd13eb73"
DEBUG: Repository com_google_googletest instantiated at:
  /home/pschrader/work/test_repo/WORKSPACE:3:13: in <toplevel>
Repository rule http_archive defined at:
  /home/pschrader/.cache/bazel/_bazel_pschrader/a2b0977d7e80e55a29a4e671bd75a258/external/bazel_tools/tools/build_defs/repo/http.bzl:364:31: in <toplevel>
Loading: 1 packages loaded
Analyzing: 3 targets (1 packages loaded, 0 targets configured)
Analyzing: 3 targets (38 packages loaded, 423 targets configured)
Analyzing: 3 targets (46 packages loaded, 505 targets configured)
INFO: Analyzed 3 targets (47 packages loaded, 641 targets configured).
INFO: Found 3 targets...

INFO: Elapsed time: 6.962s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions
INFO: Build completed successfully, 0 total actions

In other words, bazel reports no actions for this file.

What operating system are you running Bazel on?

Ubuntu 18.04.

What's the output of bazel info release?

$ bazel --nohome_rc info release
release 5.0.0rc3

Have you found anything relevant by searching the web?

I couldn't find anything on the Github issues list for this.

philsc commented 2 years ago

/cc @Wyverald

philsc commented 2 years ago

Perhaps more succinctly, why is this happening in 5.0.0rc3?

$ bazel --nohome_rc aquery --build_tests_only -- @com_google_googletest//:all | grep -c gtest-death-test
...
0

even though I get this in 4.2.2?

$ bazel --nohome_rc aquery --build_tests_only -- @com_google_googletest//:all | grep -c gtest-death-test
...
12
Wyverald commented 2 years ago

For posterity, the BUILD file in question: https://github.com/google/googletest/blob/609281088cfefc76f9d0ce82e1ff6c30cc3591e5/BUILD.bazel

From running the command in 4.2.2 vs 5.0.0rc3 myself, it looks like 5.0.0rc3 just does not contain aquery stuff for the :gtest target. I know next to nothing about how aquery works, unfortunately, so I'll ask for more help.

In the meantime, how did you come upon this, @philsc? Did you notice it from aquery logs, or was some other build action failing?

Wyverald commented 2 years ago

After removing the --build_tests_only flag, the output from both versions looks about the same. So it looks like this issue is only about some change in the interaction between aquery and --build_test_only.

@haxorz pointed out that it may be a bad idea to use those together anyhow: (the quote is about cquery but the same may apply to aquery)

--build_tests_only is a bad plan for a cquery invocation with a query expression that specifies target patterns that resolve to non-test targets, since they will get filtered out before the cquery business logic even runs.

Now we're still not quite sure what exactly caused this change, but given the circumstances I don't think this will count as a release blocker.

fmeum commented 2 years ago

I think that this is also a result of https://github.com/bazelbuild/bazel/commit/ebac27ec5a6063556482841da98d63d1abcf1e44:

$ USE_BAZEL_VERSION=5.0.0rc3 bazel --nohome_rc aquery --build_tests_only -- @com_google_googletest//:all | grep -c gtest-death-test
...
0
$ USE_BAZEL_VERSION=5.0.0rc3 bazel --nohome_rc aquery --notrim_test_configuration --build_tests_only -- @com_google_googletest//:all | grep -c gtest-death-test
...
12
gregestren commented 2 years ago

Thanks for isolating, @fmeum . I'll ensure @sdtwigg reviews.

sdtwigg commented 2 years ago

What is --build_tests_only supposed to mean when attached to a [c|a]query command? There isn't really a target pattern involved; just the query expression and (in some cases) the --universe_scope

haxorz commented 2 years ago

There isn't really a target pattern involved; just the query expression and (in some cases) the --universe_scope

@Wyverald 's quote of me had the context of a cquery invocation with a query expression that had a single target pattern. So the meaning of my statement was unambiguous in that context.

philsc commented 2 years ago

Okay, this feels like intended behaviour to me. I don't have a use case that requires interaction between aquery and --build_tests_only. This was just a red herring.

In the meantime, how did you come upon this, @philsc? Did you notice it from aquery logs, or was some other build action failing?

@Wyverald , It came to my attention because 5.0.0rc3 causes ~5000 of our CI tests to fail with this message:

/var/buildfarm/worker/shard/operations/41743718-2251-46f3-aa20-fe8a91a8ce70/bazel-out/k8-fastbuild/bin/common/testing/gtest_status_test.runfiles/__main__/common/testing/gtest_status_test: symbol lookup error: /var/buildfarm/worker/cache/b52f7f6f1e48daaae67fce096f8c68b8c4ed5736_dir/common/testing/gtest_status_test.runfiles/__main__/common/testing/../../_solib_k8/libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so: undefined symbol: _ZN4absl14lts_2020_09_2313GetStackTraceEPPvii

I was using aquery in my debugging. I suspect the new --build_tests_only behaviour in 5.0 has nothing to do with the crashes I'm observing. So I will close this issue and continue to investigate what's going on.

Thanks for the insight! I really appreciate it :+1:

sdtwigg commented 2 years ago

Are we concerned about potential interplay with --trim_test_configuration as seemingly detected by @fmeum in https://github.com/bazelbuild/bazel/issues/14505#issuecomment-1010217652 ?

I am OK with holding until a more compelling issue arises (partly because I don't have much knowledge of aquery and what the expected behaviors really are here excluding --trim_test_configuration). If I had to guess from my cursory examination earlier, there is something in the top-level of aquery that is doing the filtering (incorporating --build_tests_only somehow) and this is susceptible to the configuration. (Although, notably, the top-level configuration for tests should be untrimmed anyways.)

sdtwigg commented 2 years ago

Actually, let me summarize my understanding:

The behavior of how the aquery command and --build_tests_only flag shifted when --trim_test_configuration is on (presumably and somewhat mysteriously). However, since there is no documented or clear behaviors on how aquery and --build_tests_only interact (such that it is tempting to remove the latter from the former entirely) AND we have no evidence of users relying on even undocumented or unclear behaviors in this space, this is not actually a concern warranting continued investigation.