bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.89k stars 4.01k forks source link

Flaky bazel internal crash `IllegalStateException: Not action: CppCompileActionTemplate` when using Skymeld #22945

Open JohnnyMorganz opened 1 month ago

JohnnyMorganz commented 1 month ago

Description of the bug:

We've been getting a flaky bazel internal crash after upgrading to 7.2 from 6.4 that seems to be related to Skymeld and a TreeArtifact-based cc library (similar setup to #22886, but see below).

We see the following crash:

[22,990 / 25,056] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'TargetCompletionKey{topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, actionLookupKey=ConfiguredTargetKey{label=<top level general cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}, willTest=false}' (requested by nodes 'BuildDriverKey of ActionLookupKey: ConfiguredTargetKey{label=<top level cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:550)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
    at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Not action: CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>  0 RuleConfiguredTargetValue{actions=[CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>, action '<path of .a from cc_library of generator>' (CppArchive[[File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/_objs/redacted-cc-lib/redacted] -> [File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/libredacted-cc-lib.a]])], configuredTarget=ConfiguredTarget(<cc library target from generator>, b75007340468b702430064e766d5f8f577cdff419d7ca8b572b796f7e9104d61)}
    at com.google.devtools.build.lib.actions.ActionLookupValue.getAction(ActionLookupValue.java:34)
    at com.google.devtools.build.lib.skyframe.ActionUtils.getActionForLookupData(ActionUtils.java:31)
    at com.google.devtools.build.lib.skyframe.CompletionFunction.ensureToplevelArtifacts(CompletionFunction.java:393)
    at com.google.devtools.build.lib.skyframe.CompletionFunction.compute(CompletionFunction.java:329)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
    ... 7 more

The crash is inconsistent. If we repeat the exact same build straight afterwards, it doesn't occur again (some sort of inconsistent state / race?). The CppCompileActionTemplate action that it is complaining about is always one of the cc_library targets created using the TreeArtifact-based generator, never any other target. The top level target is unrelated and can change, it is just a target with a (transitive) dependency to the generated cc_library.


Full generator setup:

def _generate_api_files_impl(ctx):
    # We need to put the C++ files in a folder names like a C++ file to trick Bazel to accepting these folders as
    # sources and header when creating a C++ library.
    srcs_tree = ctx.actions.declare_directory(ctx.attr.name + ".cc")
    hdrs_tree = ctx.actions.declare_directory(ctx.attr.name + ".hh")

    java_tree = ctx.actions.declare_directory(ctx.attr.name + "-java-srcs")

    ctx.actions.run(
        executable = ctx.executable.generator,
        outputs = [srcs_tree, hdrs_tree, java_tree],
        arguments = [srcs_tree.path, hdrs_tree.path, java_tree.path],
    )

    srcjar = ctx.actions.declare_file(ctx.attr.name + ".srcjar")

    create_srcjar_rule(ctx, java_tree, srcjar, ctx.executable._build_zip)

    return [DefaultInfo(files = depset([srcs_tree, hdrs_tree, srcjar]))]

generate_api_files = rule(
    implementation = _generate_api_files_impl,
    attrs = {
        "generator": attr.label(executable = True, cfg = "exec"),
        "_build_zip": attr.label(default = Label(BUILD_ZIP_TOOL), cfg = "exec", executable = True),
    },
)

def generate_api(name, generator):
    generate_api_files(name = name, generator = generator)

    cc_library(
        name = name + "-cc-lib",
        srcs = [name],
        hdrs = [name],
    )

    java_library(
        name = name + "-java-lib",
        srcs = [
            ":" + name,
        ],
    )

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Unfortunately we have been unable to consistently reproduce this yet. Setting --noexperimental_merged_skyframe_analysis_execution and we no longer see this crash after a week. Open to suggestions on trying to debug

Which operating system are you running Bazel on?

Rocky Linux 9.3

What is the output of bazel info release?

release 7.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

comius commented 1 month ago

cc @joeleba