bazelbuild / rules_go

Go rules for Bazel
Apache License 2.0
1.37k stars 649 forks source link

Compiling for goos, arch and pure stuck on Google Cloud Build #1819

Open jacobfederer opened 5 years ago

jacobfederer commented 5 years ago

When building a go_binary target with dedicated goos, go arch and pure settings on Google Cloud Build the build ist stuck forever. When running the exact same code on my local Darwin architecture the build runs smoothly. The remote_http_cache might be one reason (?). Here is my Cloud Build configuration:

timeout: "1200s"
options:
  machineType: N1_HIGHCPU_8
steps:
  - name: gcr.io/cloud-builders/bazel
    args: ['test', '--define', 'project=$PROJECT_ID','--define', 'sha=$COMMIT_SHA', '--remote_http_cache=https://storage.googleapis.com/buildstore','--google_default_credentials', '//backend','//backend/...']
    id: 'go-build'
  - name: gcr.io/cloud-builders/bazel
    args: ['run','--direct_run','--define',  'project=$PROJECT_ID','--define', 'sha=$COMMIT_SHA', '--remote_http_cache=https://storage.googleapis.com/buildstore', '--google_default_credentials', '@nodejs//:yarn']
    waitFor: ['go-build']
    id: 'angular-install'

...

Here are my logs: Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 412s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 353s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 298s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 223s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 178s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 137s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 114s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 78s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 51s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 30s linux-sandbox Step #0 - "go-build": [322 / 367] 4 / 4 tests; GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 20s linux-sandbox Step #0 - "go-build": [308 / 363] GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 11s linux-sandbox ... (6 actions, 5 running) Step #0 - "go-build": [290 / 363] GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 4s linux-sandbox ... (5 actions, 4 running) Step #0 - "go-build": [206 / 284] GoCompile external/com_github_google_go_cmp/cmp/internal/value/linux_amd64_stripped/go_default_library%/github.com/google/go-cmp/cmp/internal/value.a; 0s remote-cache ... (8 actions, 7 running) Step #0 - "go-build": [151 / 198] Compiling external/com_google_protobuf/src/google/protobuf/compiler/java/java_map_field.cc [for host]; 0s remote-cache ... (8 actions, 7 running) Step #0 - "go-build": [93 / 198] Compiling external/com_google_protobuf/src/google/protobuf/compiler/java/java_enum_lite.cc [for host]; 0s remote-cache ... (8 actions, 7 running) Step #0 - "go-build": [41 / 198] Compiling external/com_google_protobuf/src/google/protobuf/generated_message_table_driven.cc [for host]; 0s remote-cache ... (8 actions, 7 running) Step #0 - "go-build": [0 / 14] [-----] Creating source manifest for //backend:backend Step #0 - "go-build": INFO: Found 15 targets and 4 test targets... Step #0 - "go-build": INFO: Analysed 19 targets (81 packages loaded, 7139 targets configured). Step #0 - "go-build": Analyzing: 19 targets (78 packages loaded, 7102 targets configured) Step #0 - "go-build": Analyzing: 19 targets (60 packages loaded, 6962 targets configured) Step #0 - "go-build": INFO: SHA256 (https://codeload.github.com/google/protobuf/zip/48cb18e5c419ddd23d9badcfe4e9df7bde1979b2) = b6b42f90c60b54732f764ae875623a9b05e6eede064173c36c6fea12dd376cdd Step #0 - "go-build": INFO: Repository rule 'com_github_golang_protobuf' returned: {"remote": "https://github.com/golang/protobuf", "commit": "aa810b61a9c79d51363740d207bb46cf8e620ed5", "shallow_since": "2018-08-14", "init_submodules": False, "verbose": False, "strip_prefix": "", "patches": [Label("@io_bazel_rules_go//third_party:com_github_golang_protobuf-gazelle.patch"), Label("@io_bazel_rules_go//third_party:com_github_golang_protobuf-extras.patch")], "patch_tool": "patch", "patch_args": ["-p1"], "patch_cmds": [], "name": "com_github_golang_protobuf"} Step #0 - "go-build": Analyzing: 19 targets (46 packages loaded, 6330 targets configured) Step #0 - "go-build": INFO: SHA256 (https://codeload.github.com/golang/tools/zip/7b71b077e1f4a3d5f15ca417a16c3b4dbb629b8b) = fe9489eabcb598e13137d0641525ff3813d8af151e1418e6940e611850d90136 Step #0 - "go-build": Analyzing: 19 targets (20 packages loaded, 54 targets configured) Step #0 - "go-build": Analyzing: 19 targets (20 packages loaded, 54 targets configured) Step #0 - "go-build": Analyzing: 19 targets (16 packages loaded, 49 targets configured) Step #0 - "go-build": Analyzing: 19 targets (9 packages loaded, 0 targets configured) Step #0 - "go-build": Analyzing: 19 targets (8 packages loaded) Step #0 - "go-build": Analyzing: 19 targets (7 packages loaded) Step #0 - "go-build": Loading: 0 packages loaded Step #0 - "go-build": INFO: SHA256 (https://github.com/bazelbuild/rules_nodejs/archive/0.15.3.tar.gz) = 1778c9ef54091907bb5ac91f9d91ff44a44c4452f79205e25a7c67b56428df73 Step #0 - "go-build": INFO: SHA256 (https://github.com/bazelbuild/rules_typescript/archive/0.20.3.zip) = 2a03b23c30c5109ab0863cfa60acce73ceb56337d41efc2dd67f8455a1c1d5f3 Step #0 - "go-build": Loading: 0 packages loaded Step #0 - "go-build": Loading: 0 packages loaded Step #0 - "go-build": Loading: 0 packages loaded Step #0 - "go-build": Loading: 0 packages loaded Step #0 - "go-build": Loading: Step #0 - "go-build": Starting local Bazel server and connecting to it... Step #0 - "go-build": Extracting Bazel installation... Step #0 - "go-build": Already have image (with digest): gcr.io/cloud-builders/bazel Starting Step #0 - "go-build"

jacobfederer commented 5 years ago

The pure setting seems to be the real deal breaker here.

jayconrod commented 5 years ago

I haven't actually tried out GCB yet. As I understand, Bazel is running on a VM somewhere, the actions are being executed on the same VM, and artifacts are being cached on GCS via HTTP.

The action that's stalling out is GoStdLib. In the default configuration, GoStdLib just provides the pre-compiled standard library in the Go SDK. In any non-default configuration (e.g., cross compilation, `pure = "on", etc.), it will recompile the standard library. When this happens, we treat all standard library source files as indiividual inputs and all compiled archives as individual outputs (expressed as a tree artifact). So Bazel will end up doing separate GET / PUT requests in parallel for each of those files. I suspect Bazel is getting swamped with all those connections and is going off into the weeds. We've had a few similar bug reports, but I've gotten been able to get a minimal, reproducible test case I could provide the Bazel team.

As a workaround, you might try tweaking --remote_max_connections and related options. Let's see if that helps.

As a workaround in rules_go, we should probably pack up the source files and the compiled .a files in .zip files. It would be easier for Bazel to ship those around in remote configurations. I'm worried there would be some overhead though, since we'd need to extract standard library archives inside each compile and link operation.

prestonvanloon commented 5 years ago

Related: #1531

jacobfederer commented 5 years ago

@jayconrod Thanks for the detailed answer.

I tried to to use the --remote_max_connections with no success.

With the upgrade to 16.3 my configuration is even stuck in normal mode. The massive amount of compilation needed for protobuf seems to be the reason. For now the only working solution for me is to turn off remote caching.

prestonvanloon commented 5 years ago

This is still a problem unfortunately. We ran with 500 max connections and stdlib didn’t build for over 5 hours! The job was eventually canceled

mariusgrigoriu commented 5 years ago

Not using Google Cloud Build, but we are remote caching to Google Cloud Storage buckets. stdlib took about 1200s to compile from our build machine. We get a similar experience when running a build from our laptops.

rohansingh commented 5 years ago

@mariusgrigoriu Thanks for the data point. I'm also trying to setup a cache with GCS, and was giving up after a few minutes. Good to know that it's successful eventually.

prestonvanloon commented 5 years ago

@rohansingh I think @mariusgrigoriu is saying that it takes 1200s to compile from the build machine when using GCS / remote caching. This wouldn't be a very successful data point

rohansingh commented 5 years ago

@prestonvanloon Haha, definitely agree that I wouldn't count it as "success". This is still a huge blocker to GCB, and to GCS as well.

But at least now I know that a local build with GCS will finish eventually.

cceckman commented 5 years ago

Circling back with a ping, as I'm also encountering this - using the remote cache is strictly worse (when building a pure binary) than just building locally. With remote cache, on Google Cloud Build, with --remote_max_connections 1000:

Step #1: [10 / 19] GoStdlib external/io_bazel_rules_go/linux_amd64_pure_stripped/stdlib%/pkg; 531s linux-sandbox
Finished Step #1
TIMEOUT

vs without caching on GCB:

Step #1: INFO: Elapsed time: 90.453s, Critical Path: 40.28s
v3n commented 3 years ago

While we're not on GCB, I noticed a nice speedup for our remote cache by disabling GoStdLib caching by adding the following in my .bazelrc:

build --modify_execution_info='GoStdlib.*=+no-remote-cache'