bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.25k stars 4.08k forks source link

Bazel build missing dep declaration with transitive dependency #15359

Open yeounoh opened 2 years ago

yeounoh commented 2 years ago

Description of the bug:

I am building tensorflow project (commit: 2c6d3ed00f16838831aa460c5668a8466b9f3649) and running into errors about the missing dependency declarations.

For instance, here is one of the errors:

ERROR: /usr/local/google/home/yeounoh/.cache/bazel/_bazel_yeounoh/dc4cfe365eb5fc5c1bdcf9c9346373b9/external/llvm-project/mlir/BUILD.bazel:3073:11: Compiling mlir/lib/Support/IndentedOstream.cpp failed: undeclared inclusion(s) in rule '@llvm-project//mlir:Support':
this rule is missing dependency declarations for the following files included by 'mlir/lib/Support/IndentedOstream.cpp':
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm-project/llvm/config.cppmap'
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm-project/llvm/Demangle.cppmap'
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm_terminfo/terminfo.cppmap'
  'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm_zlib/zlib.cppmap'

And here is the corresponding build def (from the build cache):

# /usr/local/google/home/yeounoh/.cache/bazel/_bazel_yeounoh/dc4cfe365eb5fc5c1bdcf9c9346373b9/external/llvm-project/mlir/BUILD.bazel:3073:11
cc_library(
  name = "Support",
  deps = ["@llvm-project//llvm:Support"],
  includes = ["include"],
  srcs = ["@llvm-project//mlir:lib/Support/DebugCounter.cpp", "@llvm-project//mlir:lib/Support/FileUtilities.cpp", "@llvm-project//mlir:lib/Support/IndentedOstream.cpp", "@llvm-project//mlir:lib/Support/InterfaceSupport.cpp", "@llvm-project//mlir:lib/Support/StorageUniquer.cpp", "@llvm-project//mlir:lib/Support/Timing.cpp", "@llvm-project//mlir:lib/Support/ToolUtilities.cpp", "@llvm-project//mlir:lib/Support/TypeID.cpp"],
  hdrs = ["@llvm-project//mlir:include/mlir/Support/DebugAction.h", "@llvm-project//mlir:include/mlir/Support/DebugCounter.h", "@llvm-project//mlir:include/mlir/Support/DebugStringHelper.h", "@llvm-project//mlir:include/mlir/Support/FileUtilities.h", "@llvm-project//mlir:include/mlir/Support/IndentedOstream.h", "@llvm-project//mlir:include/mlir/Support/InterfaceSupport.h", "@llvm-project//mlir:include/mlir/Support/LLVM.h", "@llvm-project//mlir:include/mlir/Support/LogicalResult.h", "@llvm-project//mlir:include/mlir/Support/MathExtras.h", "@llvm-project//mlir:include/mlir/Support/StorageUniquer.h", "@llvm-project//mlir:include/mlir/Support/ThreadLocalCache.h", "@llvm-project//mlir:include/mlir/Support/Timing.h", "@llvm-project//mlir:include/mlir/Support/ToolUtilities.h", "@llvm-project//mlir:include/mlir/Support/TypeID.h"],
)
# Rule Support instantiated at (most recent call last):
#   /usr/local/google/home/yeounoh/.cache/bazel/_bazel_yeounoh/dc4cfe365eb5fc5c1bdcf9c9346373b9/external/llvm-project/mlir/BUILD.bazel:3073:11 in <toplevel>

It doesn't include the missing depndencies, but just @llvm-project/llvm:Support; however, the build def of @llvm-project/llvm:Support does contain the missing dependency declarations (so it built successfully, too):

# /usr/local/google/home/yeounoh/.cache/bazel/_bazel_yeounoh/dc4cfe365eb5fc5c1bdcf9c9346373b9/external/llvm-project/llvm/BUILD.bazel:181:11
cc_library(
  name = "Support",
  deps = ["@llvm-project//llvm:config", "@llvm-project//llvm:Demangle", "@llvm_terminfo//:terminfo", "@llvm_zlib//:zlib"],
  ...
  ... (there is a long list of other build attributes)

If I manually add the missing deps directly to the @llvm-project/mlirSupport build def, then I can make it work (but it will run into other similar issues; repeat). I think there is something wrong with my setting that prevents transitive dependency.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Checkout https://github.com/tensorflow/tensorflow.git, commit: 2c6d3ed00f16838831aa460c5668a8466b9f3649.

Try: bazel build //tensorflow/tools/pip_package:build_pip_package

I asked my colleagues to try and some have and some don't have the issue.

Which operating system are you running Bazel on?

Debian GNU/Linux rodete, Linux 5.15.15-1rodete2-amd64, x86-64

What is the output of bazel info release?

INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=90 INFO: Reading rc options for 'info' from /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: Inherited 'common' options: --experimental_repo_remote_exec INFO: Reading rc options for 'info' from /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: Inherited 'build' options: --define framework_shared_object=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library INFO: Reading rc options for 'info' from /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.tf_configure.bazelrc: Inherited 'build' options: --action_env PYTHON_BIN_PATH=/usr/local/google/home/yeounoh/anaconda3/envs/torch-xla-1.11/bin/python3 --action_env PYTHON_LIB_PATH=/usr/local/google/home/yeounoh/anaconda3/envs/torch-xla-1.11/lib/python3.8/site-packages --python_path=/usr/local/google/home/yeounoh/anaconda3/envs/torch-xla-1.11/bin/python3 INFO: Reading rc options for 'info' from /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: Inherited 'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils INFO: Found applicable config definition build:short_logs in file /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING INFO: Found applicable config definition build:v2 in file /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1 INFO: Found applicable config definition build:linux in file /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes INFO: Found applicable config definition build:dynamic_kernels in file /usr/local/google/home/yeounoh/git/pytorch/xla/third_party/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS release 5.1.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

https://github.com/tensorflow/tensorflow.git
cb8193c5d2dd82b0a1ecaf78d37392cae8e05582
2c6d3ed00f16838831aa460c5668a8466b9f3649

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

meteorcloudy commented 2 years ago

@yeounoh Sorry, we don't have the capacity to help you debug this since we cannot reproduce it. (bazel build //tensorflow/tools/pip_package:build_pip_package works for me locally). Would bazel clean --expunge help you?

yeounoh commented 2 years ago

hi @meteorcloudy, thanks -- I've tried bazel clean --expunge but it's the same. One thing I noticed is that if I change the spawn_strategy to sandboxed it works on my local machine (but, it still fails in our CI/VM build). Could this be a useful hint?

meteorcloudy commented 2 years ago

You probably should check where do files like 'bazel-out/k8-opt-exec-50AE0418/bin/external/llvm-project/llvm/config.cppmap' come from and use bazel cquery to check why are they not in the dependencies.

bjacob commented 2 years ago

I'm running into this too.

I have the following environment variables set: CC=/usr/bin/clang, CXX=/usr/bin/clang++.

Unsetting these two environment variables removed the issue.

meteorcloudy commented 2 years ago

FYI @oquenchil Seems like Bazel has some C++ module issue when building with clang.

yeounoh commented 2 years ago

Using the nightly (pre-release) resolved the issue for me.

yeounoh commented 2 years ago

I am seeing the failure again, will reopen the issue.

@bjacob it worked for me as well. Not sure why setting CC & CXX would cause the issue, though 🤷

jpienaar commented 2 years ago

At one point (https://github.com/bazelbuild/bazel/issues/13135) --spawn_strategy=sandboxed was required due to zombie state hanging around (to paraphrase from there), potentially setting environment flags just resulted in avoiding some of that.

adam-azarchs commented 2 years ago

The issue seems to be that while the .cppmap files are actually included in the dependencies of the compile action (as seen via aquery), something else in there is upset about it. I think what's going on here is that the strict check is not recognizing those files as being headers.

samuela commented 2 years ago

This bug is also breaking the bazel build of jaxlib on x86_64-darwin with clang (https://github.com/NixOS/nixpkgs/pull/183051#issuecomment-1226635146).

uri-canva commented 2 years ago

Anyone have a small repro I can run through the debugger? Tensorflow is a bit big and I'm not very familiar with c++, so not sure I would be able to extract a small repro from it.

lockmatrix commented 2 years ago

add --spawn_strategy=sandboxed solved this problem for me

yyyokata commented 1 year ago

According to my experience, clang version <=12 can avoid this issue but clang version >= 15 will reproduce it. May this one can raise some hints.

Update:remove bazel feature layer_check also works.

hypdeb commented 1 year ago

Same issue trying to depend on boost using https://github.com/nelhage/rules_boost and --spawn_strategy=sandboxed does not help.

oquenchil commented 1 year ago

These look like undeclared inclusions thrown intentionally by the layering check. If you are affected you should either fix those errors by adding the required dependencies to the cc_library target or disable the layering check with --features=-layering_check passed on the command line. This is not a Bazel bug as far as I can tell.

Please feel free to reopen providing the exact compilation error, the Bazel build target as listed on the BUILD file and the contents of the *.cc file whose compilation is throwing the error. I'd expect that there is an #include header in the source file for which there isn't a direct dependency in the build target providing that header.

hypdeb commented 1 year ago

I just faced the same issue with gtest and wrote a minimal repro: https://github.com/hypdeb/missing-deps. Looking at the BUILD file in gtest we can see that the headers are in fact included: https://github.com/google/googletest/blob/455fcb7773dedc70ab489109fb12d8abc7fd59b6/BUILD.bazel#L86 and exist: https://github.com/google/googletest/tree/main/googletest/include/gtest/internal

@oquenchil Removing layering check does not solve the issue.

hypdeb commented 1 year ago

I ran a further experiment and building gtest itself with my toolchain fails. This means the issue I'm facing is a different one as it's not related to transitive dependencies. Please disregard my comments above.

If anyone ends up here with my issue anyways, it was solved by adding the following linker flags:

"-no-canonical-prefixes",
"-L/usr/local/llvm/lib",

to my toolchain.

keith commented 3 months ago

Original issue here likely fixed by https://github.com/bazelbuild/bazel/pull/21832, please verify with 7.3.0rc1