bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.32k stars 4.09k forks source link

add readtimeout for remote cache #5440

Closed hawkingrei closed 5 years ago

hawkingrei commented 6 years ago

We have a large repo. but sometimes it will wait for a long long time to complete or fail to download remote cache when its building Into the final stage. the log is look like this

[10,135 / 10,136] GoLink business/service/main/dapper/client/collect/darwin_amd64_stripped/go_default_test; 3495s remote-cache

but in the .bazelrc. we have configured the timeout

startup --expand_configs_in_place
startup --max_idle_secs=10800 --connect_timeout_secs=10
startup --host_jvm_args=-Dbazel.DigestFunction=sha256
# Show us information about failures.
# build --spawn_strategy=remote --genrule_strategy=remote
# build --strategy=Javac=remote --strategy=Closure=remote
build --announce_rc
build --remote_http_cache=http://bazel-cache.xxx.xxx/xxxxx
--experimental_remote_spawn_cache --remote_local_fallback
build --verbose_failures
build:unit --features=race

test --test_output=errors
test:unit --features=race
# Include git version info
build --workspace_status_command build/print-workspace-status.sh

# Make /tmp hermetic
build --jobs 32
build --sandbox_tmpfs_path=/tmp --experimental_multi_threaded_digest
build --disk_cache=~/xxx/bazel_dish_cache
build --experimental_remote_spawn_cache --remote_timeout=10
# This flag requires Bazel 0.5.0+
build --sandbox_fake_username

# Enable go race detection.
test:unit --features=race

so I check the code about remote_timeout. I only find the ChannelOption.CONNECT_TIMEOUT_MILLIS but not readtimeout. I think it can do some Improve in this place.

bazel version: 0.14.1 platform: linux/mac java-runtime: Java(TM) SE Runtime Environment (build 1.8.0_161-b12) by Oracle Corporation java-vm: Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode) by Oracle Corporation

hawkingrei commented 6 years ago

I use the greenhouse as remote cache server.

BenTheElder commented 6 years ago

Heh, it's possible there's an issue in greenhouse, I was just about to open a bug for this exact problem. My instinct is also that we need a read timeout in bazel though, even if there does turn out to be a bug in greenhouse.

/cc @buchgr

BenTheElder commented 6 years ago

Also /cc @ixdy

I can provide more details if needed but basically we have very similar configs and we see that sometimes the build will time out after a long time while waiting on the remote cache. We are on bazel 0.14.0.

hawkingrei commented 6 years ago

I find this code include the ReadTimeoutHandler.but why doesn't it work?

brown commented 6 years ago

I am seeing this problem too. If a remote http cache accepts a connection, but fails to supply data, then bazel hangs. The problem can be reproduced by pointing bazel at the Go bazel-remote cache and setting a breakpoint in that code's cache.go Get method.

buchgr commented 6 years ago

@brown thanks I ll take a look. Can you try setting --remote_timeout=5 or so to lower the timeout to 5 seconds. The default timeout is 60 seconds which is relatively high.

brown commented 6 years ago

When I did the experiment I had remote_timeout set to 1.

mafanasyev-tri commented 6 years ago

We have the same problem with a different server. I found out that if connection does not send response, the build process stops.

Here is our reproduction recipe: (1) Set up malfunctioning server which does not send any data:

socat -dDdd TCP-LISTEN:7111,reuseaddr,fork 'SYSTEM:echo connect >&2; sleep 30000'

(2) Run bazel with the remote cache:

bazel build --experimental_remote_spawn_cache --remote_rest_cache=http://localhost:7111/path --remote_upload_local_results=true --remote_timeout=5 //...

(3) Observe bazel never exiting (the line below is over 14 minutes, way longer than 5 second remote timeout):

00:16:55.820 [worker-1] [1,750 / 13,923] SkylarkAction [REDACTED]; 881s remote-cache ... (32 actions running)

this was on bazel 0.16.0, but I believe 0.18.0 has the same problem

hawkingrei commented 5 years ago

when I use the bazel 0.20.0. i still meet the same problem.

Starting local Bazel server and connecting to it...
INFO: Invocation ID: fb46f839-4bb0-49bf-b461-b011a24f3bcc
INFO: Reading 'startup' options from /Volumes/HDD/slave/workspace/warn-cache/bilirules.bazelrc: --host_jvm_args=-XX:+UseParallelGC
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'clean' from /Volumes/HDD/slave/workspace/warn-cache/bilirules.bazelrc:
  Inherited 'build' options: --incompatible_remove_native_http_archive=false --experimental_guard_against_concurrent_changes --experimental_strict_action_env --experimental_multi_threaded_digest --linkopt=-Wl,-rename_section,__TEXT,__const,__RODATA,__const --linkopt=-Wl,-rename_section,__TEXT,__cstring,__RODATA,__cstring --linkopt=-Wl,-rename_section,__TEXT,__gcc_except_tab,__RODATA,__gcc_except_tab --linkopt=-Wl,-rename_section,__TEXT,__objc_classname,__RODATA,__objc_classname --linkopt=-Wl,-rename_section,__TEXT,__objc_methname,__RODATA,__objc_methname --linkopt=-Wl,-rename_section,__TEXT,__objc_methtype,__RODATA,__objc_methtype --copt=-Werror=arc-performSelector-leaks --copt=-Werror=arc-retain-cycles --copt=-Werror=block-capture-autoreleasing --copt=-Werror=enum-conversion --copt=-Werror=incompatible-property-type --copt=-Werror=incomplete-implementation --copt=-Werror=int-conversion --copt=-Werror=literal-conversion --copt=-Werror=macro-redefined --copt=-Werror=mismatched-parameter-types --copt=-Werror=non-literal-null-conversion --copt=-Werror=nonportable-include-path --copt=-Werror=objc-literal-conversion --copt=-Werror=objc-missing-super-calls --copt=-Werror=objc-property-synthesis --copt=-Werror=objc-protocol-property-synthesis --copt=-Werror=parentheses --copt=-Werror=property-attribute-mismatch --copt=-Werror=shadow-ivar --copt=-Werror=sometimes-uninitialized --copt=-Werror=switch --copt=-Werror=tautological-constant-out-of-range-compare --copt=-Werror=unicode-whitespace --copt=-Werror=unknown-pragmas --copt=-Werror=unreachable-code --copt=-Werror=unsupported-availability-guard --copt=-Werror=unused-function --copt=-Werror=unguarded-availability-new --copt=-Wno-incomplete-umbrella --copt=-Wno-missing-braces --copt=-Wno-nullability-completeness --copt=-Wno-missing-noescape --copt=-Wno-#warnings --copt=-Wno-deprecated-declarations
INFO: Reading rc options for 'clean' from /Volumes/HDD/slave/workspace/warn-cache/release.bazelrc:
  Inherited 'build' options: --verbose_failures --announce_rc --apple_platform_type=ios --compilation_mode=opt --strip=always --apple_generate_dsym --remote_http_cache=http://bazel-ios-cache.bilibili.co/ios --remote_local_fallback --show_progress_rate_limit=5 --ios_multi_cpus=arm64,armv7 --spawn_strategy=standalone --genrule_strategy=standalone --output_filter=^$ --ios_minimum_os=8.0 --remote_timeout=5 --remote_max_connections=0
INFO: Starting clean.
+ make build
./bazel-wrapper build //bili-universal:bili-universal
Starting local Bazel server and connecting to it...
INFO: Writing profile data to '/Volumes/HDD/slave/workspace/warn-cache/bazel-profile'
INFO: Invocation ID: cef358ba-3593-4e30-b426-9ce60e21a5d3
INFO: Reading 'startup' options from /Volumes/HDD/slave/workspace/warn-cache/bilirules.bazelrc: --host_jvm_args=-XX:+UseParallelGC
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /Volumes/HDD/slave/workspace/warn-cache/bilirules.bazelrc:
  'build' options: --incompatible_remove_native_http_archive=false --experimental_guard_against_concurrent_changes --experimental_strict_action_env --experimental_multi_threaded_digest --linkopt=-Wl,-rename_section,__TEXT,__const,__RODATA,__const --linkopt=-Wl,-rename_section,__TEXT,__cstring,__RODATA,__cstring --linkopt=-Wl,-rename_section,__TEXT,__gcc_except_tab,__RODATA,__gcc_except_tab --linkopt=-Wl,-rename_section,__TEXT,__objc_classname,__RODATA,__objc_classname --linkopt=-Wl,-rename_section,__TEXT,__objc_methname,__RODATA,__objc_methname --linkopt=-Wl,-rename_section,__TEXT,__objc_methtype,__RODATA,__objc_methtype --copt=-Werror=arc-performSelector-leaks --copt=-Werror=arc-retain-cycles --copt=-Werror=block-capture-autoreleasing --copt=-Werror=enum-conversion --copt=-Werror=incompatible-property-type --copt=-Werror=incomplete-implementation --copt=-Werror=int-conversion --copt=-Werror=literal-conversion --copt=-Werror=macro-redefined --copt=-Werror=mismatched-parameter-types --copt=-Werror=non-literal-null-conversion --copt=-Werror=nonportable-include-path --copt=-Werror=objc-literal-conversion --copt=-Werror=objc-missing-super-calls --copt=-Werror=objc-property-synthesis --copt=-Werror=objc-protocol-property-synthesis --copt=-Werror=parentheses --copt=-Werror=property-attribute-mismatch --copt=-Werror=shadow-ivar --copt=-Werror=sometimes-uninitialized --copt=-Werror=switch --copt=-Werror=tautological-constant-out-of-range-compare --copt=-Werror=unicode-whitespace --copt=-Werror=unknown-pragmas --copt=-Werror=unreachable-code --copt=-Werror=unsupported-availability-guard --copt=-Werror=unused-function --copt=-Werror=unguarded-availability-new --copt=-Wno-incomplete-umbrella --copt=-Wno-missing-braces --copt=-Wno-nullability-completeness --copt=-Wno-missing-noescape --copt=-Wno-#warnings --copt=-Wno-deprecated-declarations
INFO: Reading rc options for 'build' from /Volumes/HDD/slave/workspace/warn-cache/.bazelrc:
  'build' options: --announce_rc --remote_http_cache=http://bazel-ios-cache.bilibili.co/ios --remote_local_fallback --show_progress_rate_limit=5 --verbose_failures --spawn_strategy=standalone --genrule_strategy=standalone --output_filter=^$ --ios_minimum_os=8.0 --remote_timeout=5 --remote_max_connections=0 --profile=bazel-profile --watchfs
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
INFO: Repository rule 'org_pubref_rules_protobuf' returned: {"remote": "https://github.com/pubref/rules_protobuf", "commit": "5f6195e83e06db2fd110626b0f2dc64e345e6618", "shallow_since": "2018-04-10", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "org_pubref_rules_protobuf"}
Loading: 0 packages loaded
INFO: Repository rule 'build_bazel_rules_apple' returned: {"remote": "https://github.com/Bilibili/rules_apple.git", "commit": "a18df0914f999b0156ee3ab71398981bf979a856", "shallow_since": "2018-12-10", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "build_bazel_rules_apple"}
Loading: 0 packages loaded
Loading: 0 packages loaded
INFO: Repository rule 'build_bazel_rules_swift' returned: {"remote": "https://github.com/bazelbuild/rules_swift.git", "commit": "1ef49f772449764ab2ccf2cf15db06d6b80ba05d", "shallow_since": "2018-07-31", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "build_bazel_rules_swift"}
Loading: 0 packages loaded
    currently loading: bili-universal
INFO: Repository rule 'bazel_skylib' returned: {"remote": "https://github.com/bazelbuild/bazel-skylib.git", "commit": "3fea8cb680f4a53a129f7ebace1a5a4d1e035914", "shallow_since": "2018-06-13", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "bazel_skylib"}
Analyzing: target //bili-universal:bili-universal (3 packages loaded, 0 targets configured)
INFO: Repository rule 'ijkplayer' returned: {"remote": "git@git.bilibili.co:app/ijkplayer.git", "commit": "d20c9b5f65b5d73ddf9ac2c80c4631142afb1824", "shallow_since": "2018-12-13", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "build_file_content": "\npackage(default_visibility = [\"//visibility:public\"])\nobjc_library(\n    name = \"ijkplayerwithssl_implement\",\n    srcs = glob([\"ijkmedia/ijkplayer/**/*.c\"], exclude = [\"ijkmedia/ijkplayer/android/**\", \"ijkmedia/ijkplayer/ijkavformat/ijkioandroidio.c\"]) + \n           glob([\"ijkmedia/ijkplayer/**/*.h\"], exclude = [\"ijkmedia/ijkplayer/android/**\"]) +\n           glob([\"ijkmedia/ijksdl/**/*.c\"], exclude = [\"ijkmedia/ijksdl/android/**\", \"ijkmedia/ijksdl/ijksdl_extra_log.c\"]) + \n           glob([\"ijkmedia/ijksdl/**/*.h\"], exclude = [\"ijkmedia/ijksdl/android/**\", \"ijkmedia/ijksdl/ijksdl_extra_log.h\"]) +\n           glob([\"ios/IJKMediaPlayer/IJKMediaPlayer/**/*.m\"], exclude = [\"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijkplayer/ios/ijkplayer_ios.m\",\n           \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijksdl/ios/ijksdl_aout_ios_audiounit.m\",\n           \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijksdl/ios/ijksdl_vout_ios_gles2.m\",]) +\n           glob([\"ios/IJKMediaPlayer/IJKMediaPlayer/**/*.c\"]) +\n           glob([\"ios/IJKMediaPlayer/IJKMediaPlayer/**/*.h\"]) +\n           [\"ijkmedia/ijkplayer/ijkavutil/ijkstl.cpp\", \"ijkmedia/ijksdl/gles2/renderer_yuv420sp_vtb.m\"], \n    non_arc_srcs = [\"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijkplayer/ios/ijkplayer_ios.m\",\n                    \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijksdl/ios/ijksdl_aout_ios_audiounit.m\",\n                    \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijksdl/ios/ijksdl_vout_ios_gles2.m\"],\n    includes = [\"ijkmedia\", \"ijkmedia/ijkplayer\", \"ijkmedia/ijkplayer/ijkavformat\", \n                \"ijkmedia/ijksdl\", \"ijkmedia/ijksdl\", \"ijkmedia/ijksdl/ffmpeg\", \"ijkmedia/ijkplayer/pipeline\",\n                \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia\", \n                \"ios/IJKMediaPlayer/IJKMediaPlayer/ijkmedia/ijksdl/ios\",\n                \"ios/IJKMediaPlayer/IJKMediaPlayer\",],\n    copts= [\"-Wno-unused-label\", \"-Wno-unused-function\"],\n    hdrs = glob([\"ios/IJKMediaPlayer/include/IJKMediaFrameworkWithSSL/*.h\"]),\n    pch = \"ios/IJKMediaPlayer/IJKMediaPlayer/IJKMediaPlayer-Prefix.pch\",\n    deps = [\"@loktar//:ijkffmpeg_import\"],\n    sdk_dylibs = [\"z\", \"xml2\"],\n    sdk_frameworks = [\"OpenGLES\", \"CoreMedia\", \"AVFoundation\", \n                      \"AudioToolbox\", \"VideoToolbox\", \"MediaPlayer\", \"CoreVideo\", \n                      \"CoreGraphics\", \"QuartzCore\", \"CoreTelePhony\"],\n)\nobjc_library(\n    name = \"ijkplayerwithssl_header\",\n    hdrs = glob([\"ios/IJKMediaPlayer/include/IJKMediaFrameworkWithSSL/*.h\"]),\n    includes = [\"ios/IJKMediaPlayer/include\"],\n)", "workspace_file_content": "", "name": "ijkplayer"}
INFO: Repository rule 'playerframework' returned: {"remote": "git@git.bilibili.co:app-player/player-framework.git", "commit": "a8e22c38843d33b5aefb42d2cbc29261e580d00c", "shallow_since": "2018-10-10", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "playerframework"}
INFO: Repository rule 'bgrenderer' returned: {"remote": "git@git.bilibili.co:app-player/ios-android-gles-renderer.git", "commit": "0c9f474bd9ec0819cda89cd7db67b3fbf18b68e3", "shallow_since": "2018-11-03", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [], "patch_tool": "patch", "patch_args": ["-p0"], "patch_cmds": [], "name": "bgrenderer"}
Analyzing: target //bili-universal:bili-universal (178 packages loaded, 30419 targets configured)
Analyzing: target //bili-universal:bili-universal (181 packages loaded, 30934 targets configured)
INFO: Analysed target //bili-universal:bili-universal (181 packages loaded, 30935 targets configured).
INFO: Found 1 target...
[0 / 22] [-----] Creating source manifest for @build_bazel_rules_apple//tools/plisttool:plisttool [for host] ... (12 actions, 0 running)
[63 / 66] no action
[63 / 72] [-----] Symlinking @build_bazel_rules_apple//tools/environment_plist:environment_plist [for host]
[522 / 1,869] [-----] Writing file srcs/app/BBLiveBase/DataCenter/datacenter_library-archive.objlist
[979 / 5,404] [-----] Writing file srcs/app/BBColumn/bbcolumn_library-archive.objlist
[4,125 / 8,794] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 3s remote-cache ... (24 actions, 22 running)
[8,403 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 40s remote-cache ... (22 actions, 21 running)
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 92s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 143s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 217s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 328s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 459s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 579s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 729s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 880s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 1061s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 1303s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 1544s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 1846s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 2198s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 2570s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 2993s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 3536s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 4139s remote-cache
[8,818 / 8,824] Compiling srcs/app/BBLive/BBLive/LiveDetail/Export/BBLiveRoomExport.m; 4803s remote-cache
Target //bili-universal:bili-universal failed to build
Internal error thrown during build. Printing stack trace: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//srcs/app/BBLive:bblive_library BuildConfigurationValue.Key[27f00631b210551a2853cb52811bc81d] false, actionIndex=150}' (requested by nodes 'File:[[<execution_root>]bazel-out/ios-x86_64-min8.0-applebin_ios-ios_x86_64-fastbuild/bin]srcs/app/BBLive/_objs/bblive_library/arc/BBLiveRoomExport.o')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:499)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.timeout.ReadTimeoutException

INFO: Elapsed time: 5245.242s, Critical Path: 5000.03s
INFO: 8604 processes: 8604 remote cache hit.
FAILED: Build did NOT complete successfully
Internal error thrown during build. Printing stack trace: java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//srcs/app/BBLive:bblive_library BuildConfigurationValue.Key[27f00631b210551a2853cb52811bc81d] false, actionIndex=150}' (requested by nodes 'File:[[<execution_root>]bazel-out/ios-x86_64-min8.0-applebin_ios-ios_x86_64-fastbuild/bin]srcs/app/BBLive/_objs/bblive_library/arc/BBLiveRoomExport.o')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:499)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.timeout.ReadTimeoutException
java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//srcs/app/BBLive:bblive_library BuildConfigurationValue.Key[27f00631b210551a2853cb52811bc81d] false, actionIndex=150}' (requested by nodes 'File:[[<execution_root>]bazel-out/ios-x86_64-min8.0-applebin_ios-ios_x86_64-fastbuild/bin]srcs/app/BBLive/_objs/bblive_library/arc/BBLiveRoomExport.o')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:499)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.timeout.ReadTimeoutException
java.lang.RuntimeException: Unrecoverable error while evaluating node 'ActionLookupData{actionLookupKey=//srcs/app/BBLive:bblive_library BuildConfigurationValue.Key[27f00631b210551a2853cb52811bc81d] false, actionIndex=150}' (requested by nodes 'File:[[<execution_root>]bazel-out/ios-x86_64-min8.0-applebin_ios-ios_x86_64-fastbuild/bin]srcs/app/BBLive/_objs/bblive_library/arc/BBLiveRoomExport.o')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:499)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.netty.handler.timeout.ReadTimeoutException
FAILED: Build did NOT complete successfully
nicolov commented 5 years ago

This is a problem for us too, it doesn't seem like --remote_timeout is respected. Some of our builds timeout, I guess because there are enough zombie requests to tie up all the concurrency in the downloads.

It's easy to hack bazel-remote to test this, like:

w.Header().Set("Content-Type", "application/octet-stream")
w.Header().Set("Content-Length", strconv.FormatInt(sizeBytes, 10))

// HACK
time.Sleep(5 * time.Second)

io.Copy(w, data)
nicolov commented 5 years ago

Found and fixed the bug in https://github.com/bazelbuild/bazel/pull/7040

buchgr commented 5 years ago

@nicolov I think we should at least print a warning that something timed out. It's important to not hide such information from users as this might hide problems. I have send out https://github.com/bazelbuild/bazel/pull/7209 on top of your change.