Open werkt opened 2 months ago
@werkt This one looks similar to https://github.com/bazelbuild/bazel/issues/21712?
cc: @Wyverald @fmeum
Oh wow, that is a ridiculous stack trace. Attempting a refetch might not be the best idea here.
@iancha1992 it looks similar to #21712 but is slightly different, so let's keep this one open.
@werkt There have been a number of fixes in this area in 7.2.0rc2. Could you please try using that and see if this still pops up?
Checking in again as 7.2.0 is now out. Other related issues have been fixed, and this one might have been too. I'll lower the priorty for now, and we can close this if no more reports show up for a while.
Description of the bug:
A hang occurs when successive CancellationExceptions are observed in StarlarkRepositoryFunction, resulting in, seemingly a wait for a signal that will never come.
The jstack frames for an executor are featured below, after bazel presented the following output:
I believe the workerFuture is cancelled at this point, and that the lack of elements inserted into the signalQueue is the only thing preventing infinite recursion of the fetch call with unchanged state.
Which category does this issue belong to?
Core, External Dependency
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I can only guess running in "auto" or "virtual" --experimental_worker_for_repo_fetching mode in a constrained memory environment with repository fetches to perform.
Which operating system are you running Bazel on?
linux
What is the output of
bazel info release
?7.1.1
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
1590dbc4be2ce262ee9348e12cdb44c3b6ee0544
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?