bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.7k stars 3.98k forks source link

Skymeld with `--notrack_incremental_state` throws `ArtifactPrefixConflictException` for TreeArtifact based cc_library #22886

Open JohnnyMorganz opened 1 week ago

JohnnyMorganz commented 1 week ago

Description of the bug:

We have a source code generator that generates cpp & h files into a TreeArtifact. The tree artifact is then passed into a cc_library target.

We recently upgraded from Bazel 6.4.0 to Bazel 7.2 and experience the following errors when --notrack_incremental_state is enabled (which we set on CI since our bazel servers are not kept across jobs)

ERROR: One of the output paths 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files' (belonging to //module:sample-library) and 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files/Foo.o.d' (belonging to //module:sample-library) is a prefix of the other. These actions cannot be simultaneously present; please rename one of the output files or build just one of them
ERROR: One of the output paths 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files' (belonging to //module:sample-library) and 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files/Bar.o.d' (belonging to //module:sample-library) is a prefix of the other. These actions cannot be simultaneously present; please rename one of the output files or build just one of them
Use --verbose_failures to see the command lines of failed build steps.
ERROR: com.google.devtools.build.lib.actions.ArtifactPrefixConflictException: One of the output paths 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files' (belonging to //module:sample-library) and 'bazel-out/darwin_arm64-fastbuild/bin/module/_dotd/sample-library/generated-cc-files/Foo.o.d' (belonging to //module:sample-library) is a prefix of the other. These actions cannot be simultaneously present; please rename one of the output files or build just one of them

Note that this only happens when building both the cc_library target and an "independent" java_library target at the same time, defined in the same BUILD.bazel file. When building the cc_library target by itself, it does not error. When disabling skymeld with --noexperimental_merged_skyframe_analysis_execution it also does not error. When replacing the java_library with another random target (sh_library / py_library) it does not error.

The java_library does not reference the cc_target at all, e.g.:

java_library(
    name = "java-library",
    srcs = [],
)

Which category does this issue belong to?

C++ Rules

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Setup a sample Bazel workspace with the following, to mimic a code generator creating .cc and .h files in tree artifacts

# module/defs.bzl
def _generate_cc_files_impl(ctx):
    srcs_tree = ctx.actions.declare_directory(ctx.attr.name + ".cc")
    hdrs_tree = ctx.actions.declare_directory(ctx.attr.name + ".hh")

    ctx.actions.run_shell(
        command="""
        echo "struct Foo;" > {SRCS_TREE}/Foo.cc
        echo "struct Bar;" > {SRCS_TREE}/Bar.cc
        echo "struct Foo;" > {HDRS_TREE}/Foo.h
        echo "struct Bar;" > {HDRS_TREE}/Bar.h
        """.format(
            SRCS_TREE=srcs_tree.path, HDRS_TREE=hdrs_tree.path
        ),
        outputs=[srcs_tree, hdrs_tree],
        arguments=[srcs_tree.path, hdrs_tree.path],
    )

    return [DefaultInfo(files=depset([srcs_tree, hdrs_tree]))]

generate_cc_files = rule(
    implementation=_generate_cc_files_impl,
    attrs={
        "srcs": attr.label_list(allow_files=True),
        "hdrs": attr.label_list(allow_files=True),
    },
)

# module/BUILD.bazel
load(":defs.bzl", "generate_cc_files")

generate_cc_files(
    name = "generated-cc-files",
)

cc_library(
    name = "sample-library",
    srcs = [":generated-cc-files"],
    hdrs = [":generated-cc-files"],
)

java_library(
    name = "java-library",
    srcs = [],
)

Try the following commands:

$ bazel clean --expunge && bazel build //module/... # default settings, works
$ bazel clean --expunge && bazel build //module/... --notrack_incremental_state # fails with ArtifactPrefixConflictException
$ bazel clean --expunge && bazel build //module/... --notrack_incremental_state --noexperimental_merged_skyframe_analysis_execution # works

Note in particular that you must build the java_library target as well as the cc_library target at the same time. If you comment out the java_library target, or run the below command, it passes

$ bazel clean --expunge && bazel build //module:sample-library --notrack_incremental_state # works

Which operating system are you running Bazel on?

MacOS

What is the output of bazel info release?

release 7.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

fmeum commented 1 week ago

@joeleba

joeleba commented 1 week ago

Should be fixed with https://github.com/bazelbuild/bazel/commit/12aa54e527eafb8db74fdf9a5eae69db97b21fa5

fmeum commented 1 week ago

@bazel-io flag

iancha1992 commented 1 week ago

@bazel-io fork 7.3.0

JohnnyMorganz commented 1 week ago

Can confirm the issue is fixed with that commit, thank you!

Sorry to hijack an existing issue, but maybe you folks know what's up here. We've been getting a flaky bazel internal crash after upgrading to 7.2 from 6.4 that seems to be related to Skymeld and the same TreeArtifact-based cc library as in OP (*very slightly different setup, see below). The crash is unrelated to notrack_internal_state and conflict checking, but we haven't been able to get a consistent minimal repro so I haven't opened a new issue about it yet. Let me know if I should.

We see the following crash:

[22,990 / 25,056] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'TargetCompletionKey{topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, actionLookupKey=ConfiguredTargetKey{label=<top level general cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}, willTest=false}' (requested by nodes 'BuildDriverKey of ActionLookupKey: ConfiguredTargetKey{label=<top level cc library target, not from generator>, config=BuildConfigurationKey[6de9c493725e885249a68bcd3cab225a7c98a12a462c2ead63bd885b18e247ba]}')
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:550)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
    at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: java.lang.IllegalStateException: Not action: CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>  0 RuleConfiguredTargetValue{actions=[CppCompileActionTemplate compiling <bazel-out path of .cc from cc_library of generator>, action '<path of .a from cc_library of generator>' (CppArchive[[File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/_objs/redacted-cc-lib/redacted] -> [File:[[<execution_root>]bazel-out/k8-dbg--cd/bin]<redacted>/libredacted-cc-lib.a]])], configuredTarget=ConfiguredTarget(<cc library target from generator>, b75007340468b702430064e766d5f8f577cdff419d7ca8b572b796f7e9104d61)}
    at com.google.devtools.build.lib.actions.ActionLookupValue.getAction(ActionLookupValue.java:34)
    at com.google.devtools.build.lib.skyframe.ActionUtils.getActionForLookupData(ActionUtils.java:31)
    at com.google.devtools.build.lib.skyframe.CompletionFunction.ensureToplevelArtifacts(CompletionFunction.java:393)
    at com.google.devtools.build.lib.skyframe.CompletionFunction.compute(CompletionFunction.java:329)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
    ... 7 more

The crash is inconsistent. If we repeat the exact same build straight afterwards, it doesn't occur again (some sort of inconsistent state / race?). The CppCompileActionTemplate action that it is complaining about is always one of the cc_library targets created using the TreeArtifact-based generator, never any other target. The top level target is unrelated and can change, it is just a target with a (transitive) dependency to the generated cc_library. We had disabled skymeld because of the issue in OP and this crash seemed to no longer occur in our logs (going to give it more time to confirm).

Do you have any tips to help aid in debugging or getting more information about this?


Full generator setup:

def _generate_api_files_impl(ctx):
    # We need to put the C++ files in a folder names like a C++ file to trick Bazel to accepting these folders as
    # sources and header when creating a C++ library.
    srcs_tree = ctx.actions.declare_directory(ctx.attr.name + ".cc")
    hdrs_tree = ctx.actions.declare_directory(ctx.attr.name + ".hh")

    java_tree = ctx.actions.declare_directory(ctx.attr.name + "-java-srcs")

    ctx.actions.run(
        executable = ctx.executable.generator,
        outputs = [srcs_tree, hdrs_tree, java_tree],
        arguments = [srcs_tree.path, hdrs_tree.path, java_tree.path],
    )

    srcjar = ctx.actions.declare_file(ctx.attr.name + ".srcjar")

    create_srcjar_rule(ctx, java_tree, srcjar, ctx.executable._build_zip)

    return [DefaultInfo(files = depset([srcs_tree, hdrs_tree, srcjar]))]

generate_api_files = rule(
    implementation = _generate_api_files_impl,
    attrs = {
        "generator": attr.label(executable = True, cfg = "exec"),
        "_build_zip": attr.label(default = Label(BUILD_ZIP_TOOL), cfg = "exec", executable = True),
    },
)

def generate_api(name, generator):
    generate_api_files(name = name, generator = generator)

    cc_library(
        name = name + "-cc-lib",
        srcs = [name],
        hdrs = [name],
    )

    java_library(
        name = name + "-java-lib",
        srcs = [
            ":" + name,
        ],
    )
joeleba commented 3 days ago

Could you please open a separate issue for that? Thanks!

JohnnyMorganz commented 3 days ago

Opened #22945. Sorry we couldn't be more helpful with a repro, there is no consistent reproduction yet. We did disable skymeld and see zero instances of the crash in our logs over the past week now.