bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.29k stars 4.09k forks source link

Coverage commands in RBE crash Bazel #20578

Open UebelAndre opened 11 months ago

UebelAndre commented 11 months ago

Description of the bug:

I'm trying to add regression testing for generating coverage reports for Rust in RBE environments on https://github.com/bazelbuild/rules_rust/pull/2005 and I run into the following crash:

https://buildkite.com/bazel/rules-rust-rustlang/builds/10127#018c793a-1630-4729-a194-396fb371e6bf

04:45:40) ERROR: <builtin>: Coverage report generation failed: (Exit 34): INVALID_ARGUMENT: Invalid arguments:
  "command.ValidateSpec": Invalid spec - docker container must be specified
java.io.IOException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Invalid arguments:
  "command.ValidateSpec": Invalid spec - docker container must be specified
    at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:241)
    at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1493)
    at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:292)
    at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:245)
    at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:127)
    at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
    at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:265)
    at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:159)
    at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:119)
    at com.google.devtools.build.lib.exec.SpawnStrategyResolver.exec(SpawnStrategyResolver.java:45)
    at com.google.devtools.build.lib.bazel.coverage.CoverageReportActionBuilder$CoverageReportAction.execute(CoverageReportActionBuilder.java:140)
    at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.executeAction(SkyframeActionExecutor.java:1148)
    at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1065)
    at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:165)
    at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:94)
    at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:562)
    at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:859)
    at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:333)
    at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:171)
    at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
    at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Invalid arguments:
  "command.ValidateSpec": Invalid spec - docker container must be specified
    at io.grpc.Status.asRuntimeException(Status.java:535)
    at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
    at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:175)
    at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:245)
    at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:127)
    at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
    at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$3(GrpcRemoteExecutor.java:146)
    at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:525)
    at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:144)
    ... 26 more

Which category does this issue belong to?

Remote Execution

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Use the changes at https://github.com/bazelbuild/rules_rust/pull/2005 to build with RBE as described by presubmit.yaml

rbe_ubuntu2004:
    shell_commands:
      - sed -i 's/^# load("@bazelci_rules/load("@bazelci_rules/' WORKSPACE.bazel
      - sed -i 's/^# rbe_preconfig/rbe_preconfig/' WORKSPACE.bazel
    coverage_targets:
      - "--"
      - "//..."

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

7.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

meteorcloudy commented 11 months ago

/cc @c-mita @coeuvre

tjgq commented 11 months ago

It appears that we’re not propagating the platform properties to the coverage spawn. I’ll take a look this week, it’s probably a simple fix.

On Tue, Dec 19, 2023 at 14:18 Yun Peng @.***> wrote:

/cc @c-mita https://github.com/c-mita @coeuvre https://github.com/coeuvre

— Reply to this email directly, view it on GitHub https://github.com/bazelbuild/bazel/issues/20578#issuecomment-1862744360, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBK5HJ3HO4WZO7RCBWWOKTYKGH3XAVCNFSM6AAAAABAYSEDNKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRSG42DIMZWGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fmeum commented 11 months ago

@tjgq Just in case it's helpful, I started working on something similar at some point in the past but only covered the Skymeld case: https://github.com/bazelbuild/bazel/pull/19784.

https://github.com/bazelbuild/bazel/issues/19781 is related.

UebelAndre commented 11 months ago

It would be great to have this fixed to be able to have RBE regression testing for coverage reports.

tjgq commented 10 months ago

@fmeum Thanks, I am convinced that we do indeed need something similar to https://github.com/bazelbuild/bazel/pull/19784. However, I'd rather make remote execution work than pin it to the host platform; from my spelunking, it's apparent that we've always intended CoverageAction to be remotable, but we didn't wire up the execution properties correctly. It doesn't help that CoverageAction exists "outside of the system" (it has no owning target), so the wiring is extra annoying...

@UebelAndre Until then, I believe this can be worked around in one of two ways (please let me know if neither one works, as that means I have the wrong repro):

  1. Set the container-image exec property (and any other required properties) via --remote_default_exec_properties instead of a platform rule (i.e., --remote_default_exec_properties=container-image=docker://...
  2. Force the coverage processing to run locally with --strategy=CoverageReport=local