bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.95k stars 4.02k forks source link

Downloading external dependencies times out after 10 minutes #22999

Closed DrCHall closed 1 month ago

DrCHall commented 1 month ago

Description of the bug:

After upgrading from bazel 6.5.0 to 7.2.0, whenever I run bazel build in my project after cleaning the bazel cache the build fails with the following error (some content redacted for privacy):

$ bazel build //...

Starting local Bazel server and connecting to it...
WARNING: Stripping enabled, but '--copt=-g' (or --per_file_copt=...@-g) specified. Debug information will be generated and then stripped away. This is probably not what you want! Use '-c dbg' for debug mode, or use '--strip=never' to disable stripping
INFO: Repository rules_python~~pip~pip_deps_311_tensorflow instantiated at:
  <builtin>: in <toplevel>
Repository rule whl_library defined at:
  [redacted]/external/rules_python~/python/pip_install/pip_repository.bzl:1032:30: in <toplevel>
ERROR: An error occurred during the fetch of repository 'rules_python~~pip~pip_deps_311_tensorflow':
   Traceback (most recent call last):
    File "[redacted]/external/rules_python~/python/pip_install/pip_repository.bzl", line 839, column 35, in _whl_library_impl
        repo_utils.execute_checked(
    File "[redacted]/external/rules_python~/python/private/repo_utils.bzl", line 182, column 29, in _execute_checked
        return _execute_internal(fail_on_error = True, *args, **kwargs)
    File "[redacted]/external/rules_python~/python/private/repo_utils.bzl", line 123, column 13, in _execute_internal
        fail((
Error in fail: repo.execute: whl_library.ResolveRequirement(rules_python~~pip~pip_deps_311_tensorflow, tensorflow==2.15.1): end: failure:
  command: [redacted]
  return code: 256
  working dir: <default: [redacted]/external/rules_python~~pip~pip_deps_311_tensorflow>
  timeout: 600
  environment:
PYTHONPATH=[redacted]
CPPFLAGS="-isystem [redacted]/external/rules_python~~python2~python_3_11_host/include/python3.11"
<stdout empty>
===== stderr start =====
Timed out
===== stderr end =====
ERROR: no such package '@@rules_python~~pip~pip_deps_311_tensorflow//': repo.execute: whl_library.ResolveRequirement(rules_python~~pip~pip_deps_311_tensorflow, tensorflow==2.15.1): end: failure:
  command: [redacted]/external/rules_python~~python2~python_3_11_host/python -m python.pip_install.tools.wheel_installer.wheel_installer --requirement tensorflow==2.15.1 --isolated --extra_pip_args [redacted]
  return code: 256
  working dir: <default: [redacted]/external/rules_python~~pip~pip_deps_311_tensorflow>
  timeout: 600
  environment:
PYTHONPATH=[redacted]
CPPFLAGS="-isystem [redacted]/external/rules_python~~python2~python_3_11_host/include/python3.11"
<stdout empty>
===== stderr start =====
Timed out
===== stderr end =====
ERROR: [redacted]/external/rules_python~~pip~pip_deps/tensorflow/BUILD.bazel:10:6: @@rules_python~~pip~pip_deps//tensorflow:pkg depends on @@rules_python~~pip~pip_deps_311_tensorflow//:pkg in repository @@rules_python~~pip~pip_deps_311_tensorflow which failed to fetch. no such package '@@rules_python~~pip~pip_deps_311_tensorflow//': repo.execute: whl_library.ResolveRequirement(rules_python~~pip~pip_deps_311_tensorflow, tensorflow==2.15.1): end: failure:
  command: [redacted]/external/rules_python~~python2~python_3_11_host/python -m python.pip_install.tools.wheel_installer.wheel_installer --requirement tensorflow==2.15.1 --isolated --extra_pip_args [redacted]
  return code: 256
  working dir: <default: [redacted]/external/rules_python~~pip~pip_deps_311_tensorflow>
  timeout: 600
  environment:
PYTHONPATH=[redacted]
<stdout empty>
===== stderr start =====
Timed out
===== stderr end =====
ERROR: Analysis of target '[redacted]' failed; build aborted: Analysis failed
INFO: Elapsed time: 605.958s, Critical Path: 0.23s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
FAILED: 
    Fetching repository @@boost; starting 600s
    Fetching repository @@toolchains_llvm~~llvm~llvm_toolchain_llvm; starting 600s
    Fetching repository @@rules_python~~pip~pip_deps_311_nvidia_cudnn_cu12; starting 599s
    Fetching https://boostorg.jfrog.io/artifactory/main/release/1.78.0/source/boost_1_78_0.tar.gz; 73.0 MiB (58.3%) 596s
    Fetching https://github.com/.../llvm-18.1.8/clang%2Bllvm-18.1.8-x86_64-linux-gnu-ubuntu-22.04.tar.zst; 414.8 MiB (39.8%) 596s

Which category does this issue belong to?

CLI, External Dependency, Python Rules

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a build target that has a large external dependency that takes longer than 10 minutes to download, then run

bazel clean && rm -rf ~/.cache/bazel/* && pip cache purge
bazel build //...

Which operating system are you running Bazel on?

Ubuntu 22.04

What is the output of bazel info release?

release 7.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

Error encountered in a proprietary private Git repo

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

bazel 6.5.0 did not have this problem, dependencies would download after 15-20 minutes without a "timed out" error

Have you found anything relevant by searching the web?

No useful information was found online. Running

$ bazel help build | egrep '10m|600'

revealed that the only option with a 10 minute default was --bes_oom_finish_upload_timeout, but it did not fix my problem

Any other information, logs, or outputs that you want to share?

No response

meteorcloudy commented 1 month ago

@DrCHall Can you please try testing Bazel 7.2.1? Which has a number of fixes in this area: https://github.com/bazelbuild/bazel/compare/7.2.0...7.2.1

DrCHall commented 1 month ago

@meteorcloudy Yes, updating to version 7.2.1 fixed it. Thank you!