bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.97k stars 4.03k forks source link

deferred download tests hang when the repo is defined in MODULE.bazel #23234

Closed meteorcloudy closed 15 hours ago

meteorcloudy commented 1 month ago

Description of the bug:

Discovered while migrating tests from WORKSPACE to Bzlmod: https://github.com/bazelbuild/bazel/pull/23087

After changing

  cat > WORKSPACE <<'EOF'
load("defer.bzl", "defer")
defer(name="defer")
EOF

to

  cat > MODULE>bazel <<'EOF'
defer = use_repo_rule("defer.bzl", "defer")
defer(name="defer")
EOF

All test cases using deferred.wait() seems to hang indefinitely.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Apply the above change in test_deferred_download_smoke then

bazel test //src/test/shell/bazel:external_integration_test --test_filter test_deferred_download_smoke

Which operating system are you running Bazel on?

Linux, macOS

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

Related commit is https://github.com/bazelbuild/bazel/commit/73c1a1e382cdd99766e4e11bb917e9672c11b5f3

Any other information, logs, or outputs that you want to share?

No response

meteorcloudy commented 1 month ago

/cc @Wyverald @lberki

Wyverald commented 2 days ago

This one is almost funny... This line is invalid MODULE.bazel: defer = use_repo_rule("defer.bzl", "defer") (the label needs to be "//:defer.bzl"). Bazel immediately crashes (which is I think the same as https://github.com/bazelbuild/bazel/issues/23138), but the test setup is such that it waits for an "ack" from Bazel before continuing, and obviously that "ack" never comes. So the test hangs forever.

meteorcloudy commented 1 day ago

Thanks for figuring it out!!

meteorcloudy commented 1 day ago

It is possible to somehow fix the test setup to let it fail with a meaningful error?

Wyverald commented 1 day ago

there's apparently a "timeout" command on GNU/Linux (https://stackoverflow.com/questions/7270622/reading-with-cat-stop-when-not-receiving-data) but it's not available on macOS.

other solutions seem rather messy (one requires a & sleep 5 ; kill $! tagged on to every cat or echo to a pipe). not sure what the best way here is, maybe @lberki has an idea.

lberki commented 1 day ago

By "test setup" do you mean the cat "${server_dir}/gate_socket" statements that wait until Bazel does something by synchronizing on a named pipe?

If so, I don't have any wisdom to offer other than that it's very reasonable to encapsulate the "sleep then kill" functionality in a function, which would result in less pollution. Something like this (wrote in half a minute, don't expect this to be perfect):

function timeout() {
  local T=$1;
  shift;
  (bash -c "$@") &
  WORK=$!; 
  sleep $T; kill $WORK) &
  WATCHDOG=$!;
  wait $WORK;
  kill -9 $WATCHDOG
}

timeout 5 "cat $FIFO"