bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.14k stars 4.05k forks source link

[8.0.0rc1] Builds hang on Windows with --experimental_collect_worker_data_in_profiler. #23952

Open criemen opened 1 week ago

criemen commented 1 week ago

Description of the bug:

When upgrading to bazel 8 (from a pre-release of bazel 7.4.0), we're observing hangs of bazel when building our codebase on Windows. The hangs happen both on CI and locally, but don't seem to be 100% reproducible.

I've attached a bazel profile, compact execution log, and jstack traces of the two relevant (I believe) java processes for the build. Let me know if I can support you with more debug information.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I've not been able to reproduce this on our public codebase, and will investigate further reductions only if the current debug information isn't sufficient.

Which operating system are you running Bazel on?

Windows 11

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

hang-debugging.zip

fmeum commented 1 week ago

Stack traces and the profile to me look indistinguishable from a build that just waits for long-running actions to finish (but of course they are taking very long).

If possible, could you try to bisect this down to a particular rolling release or even commit? Bazelisk accepts individual Bazel commits.

fmeum commented 1 week ago

@bazel-io flag

criemen commented 1 week ago

Okay we're getting somewhere: disabling --experimental_collect_worker_data_in_profiler stops the hangs from occurring (so this might not be a release blocker after all). We also had this enabled on 7.3/7.4, but it might be that the option is just silently ignored on those branches?

I got hangs back to (at least) 8.0.0-pre20240516.1, then in my manual bisecting I switched to an older version that didn't have the flag,

Enabling that flag by default was reverted, due to flakiness in the multiplex_worker tests in https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31

Unfortunately, there's no PRs or external history associated with this flag.

fmeum commented 1 week ago

It didn't do anything on Windows before the revert: https://github.com/bazelbuild/bazel/commit/a9525c701125664bb9daf5637084e85dff186d31#diff-b572d41bff84fa61b397e97467a898b32baf118421a0b06859e3fa04c556a7ebL219

I don't know how it works, but maybe this if should be brought back?

iancha1992 commented 1 week ago

@bazel-io fork 8.0.0