Open JohnnyMorganz opened 1 week ago
@joeleba
Should be fixed with https://github.com/bazelbuild/bazel/commit/12aa54e527eafb8db74fdf9a5eae69db97b21fa5
@bazel-io flag
@bazel-io fork 7.3.0
Can confirm the issue is fixed with that commit, thank you!
Sorry to hijack an existing issue, but maybe you folks know what's up here. We've been getting a flaky bazel internal crash after upgrading to 7.2 from 6.4 that seems to be related to Skymeld and the same TreeArtifact-based cc library as in OP (*very slightly different setup, see below). The crash is unrelated to notrack_internal_state and conflict checking, but we haven't been able to get a consistent minimal repro so I haven't opened a new issue about it yet. Let me know if I should.
We see the following crash:
[22,990 / 25,056] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'TargetCompletionKey{topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, actionLookupKey=ConfiguredTargetKey{label=<top level general cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}, willTest=false}' (requested by nodes 'BuildDriverKey of ActionLookupKey: ConfiguredTargetKey{label=<top level cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}')
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:550)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Not action: CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator> 0 RuleConfiguredTargetValue{actions=[CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>, action '<path of .a from cc_library of generator>' (CppArchive[[File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/_objs/redacted-cc-lib/redacted] -> [File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/libredacted-cc-lib.a]])], configuredTarget=ConfiguredTarget(<cc library target from generator>, b75007340468b702430064e766d5f8f577cdff419d7ca8b572b796f7e9104d61)}
at com.google.devtools.build.lib.actions.ActionLookupValue.getAction(ActionLookupValue.java:34)
at com.google.devtools.build.lib.skyframe.ActionUtils.getActionForLookupData(ActionUtils.java:31)
at com.google.devtools.build.lib.skyframe.CompletionFunction.ensureToplevelArtifacts(CompletionFunction.java:393)
at com.google.devtools.build.lib.skyframe.CompletionFunction.compute(CompletionFunction.java:329)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
... 7 more
The crash is inconsistent. If we repeat the exact same build straight afterwards, it doesn't occur again (some sort of inconsistent state / race?). The CppCompileActionTemplate
action that it is complaining about is always one of the cc_library targets created using the TreeArtifact-based generator, never any other target. The top level target is unrelated and can change, it is just a target with a (transitive) dependency to the generated cc_library. We had disabled skymeld because of the issue in OP and this crash seemed to no longer occur in our logs (going to give it more time to confirm).
Do you have any tips to help aid in debugging or getting more information about this?
Full generator setup:
def _generate_api_files_impl(ctx):
# We need to put the C++ files in a folder names like a C++ file to trick Bazel to accepting these folders as
# sources and header when creating a C++ library.
srcs_tree = ctx.actions.declare_directory(ctx.attr.name + ".cc")
hdrs_tree = ctx.actions.declare_directory(ctx.attr.name + ".hh")
java_tree = ctx.actions.declare_directory(ctx.attr.name + "-java-srcs")
ctx.actions.run(
executable = ctx.executable.generator,
outputs = [srcs_tree, hdrs_tree, java_tree],
arguments = [srcs_tree.path, hdrs_tree.path, java_tree.path],
)
srcjar = ctx.actions.declare_file(ctx.attr.name + ".srcjar")
create_srcjar_rule(ctx, java_tree, srcjar, ctx.executable._build_zip)
return [DefaultInfo(files = depset([srcs_tree, hdrs_tree, srcjar]))]
generate_api_files = rule(
implementation = _generate_api_files_impl,
attrs = {
"generator": attr.label(executable = True, cfg = "exec"),
"_build_zip": attr.label(default = Label(BUILD_ZIP_TOOL), cfg = "exec", executable = True),
},
)
def generate_api(name, generator):
generate_api_files(name = name, generator = generator)
cc_library(
name = name + "-cc-lib",
srcs = [name],
hdrs = [name],
)
java_library(
name = name + "-java-lib",
srcs = [
":" + name,
],
)
Could you please open a separate issue for that? Thanks!
Opened #22945. Sorry we couldn't be more helpful with a repro, there is no consistent reproduction yet. We did disable skymeld and see zero instances of the crash in our logs over the past week now.
Description of the bug:
We have a source code generator that generates cpp & h files into a TreeArtifact. The tree artifact is then passed into a
cc_library
target.We recently upgraded from Bazel 6.4.0 to Bazel 7.2 and experience the following errors when
--notrack_incremental_state
is enabled (which we set on CI since our bazel servers are not kept across jobs)Note that this only happens when building both the cc_library target and an "independent" java_library target at the same time, defined in the same BUILD.bazel file. When building the cc_library target by itself, it does not error. When disabling skymeld with
--noexperimental_merged_skyframe_analysis_execution
it also does not error. When replacing the java_library with another random target (sh_library
/py_library
) it does not error.The java_library does not reference the cc_target at all, e.g.:
Which category does this issue belong to?
C++ Rules
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Setup a sample Bazel workspace with the following, to mimic a code generator creating .cc and .h files in tree artifacts
Try the following commands:
Note in particular that you must build the java_library target as well as the cc_library target at the same time. If you comment out the java_library target, or run the below command, it passes
Which operating system are you running Bazel on?
MacOS
What is the output of
bazel info release
?release 7.2.0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response