Open stevebarrau opened 11 months ago
@coeuvre and @fmeum - I think you've been working at the intersection of coverage + remote execution(e.g. https://github.com/bazelbuild/bazel/pull/16556) - any thoughts on how the merger action should be getting a platform?
Could you check whether https://github.com/bazelbuild/bazel/pull/19784 fixes this? It's pretty hacky though and untested (Bazel CI passes with it). It also requires Skymeld.
I checked on 7b28c63f02287a32ab05ac21d9e0cc9be53f80f7.
The Configuration
is still "system" but now our RBE platform is getting picked up in the Execution platform
. We looked into how execProperties are set based on the platform and found a code path where only remoteExecutionProperties are used, and not execProperties. I filed a separate PR with the fix here: https://github.com/bazelbuild/bazel/pull/19792
This separate PR + the changes in this PR makes coverage pass in our use case.
$ bazel-with-our-rbe coverage -s --combined_report=lcov //python:trivial_test
...
SUBCOMMAND: # (unknown) [action 'Coverage report generation', configuration: system, execution platform: @our_rbe_repo//common/platforms:ubi9-x86_64, mnemonic: CoverageReport]
(cd XXX/execroot/federation_example && \
exec env - \
JAVA_RUNFILES=bazel-out/k8-opt-exec-ST-e4625b82c993/bin/external/remote_coverage_tools/Main.runfiles \
PYTHON_RUNFILES=bazel-out/k8-opt-exec-ST-e4625b82c993/bin/external/remote_coverage_tools/Main.runfiles \
bazel-out/k8-opt-exec-ST-e4625b82c993/bin/external/remote_coverage_tools/Main '--reports_file=bazel-out/_coverage/lcov_files.tmp' '--output_file=bazel-out/_coverage/_coverage_report.dat')
# Configuration: system
# Execution platform: @our_rbe_repo//common/platforms:ubi9-x86_64
ERROR: <builtin>: Coverage report generation failed: (Exit 34): FAILED_PRECONDITION: No workers exist for instance name prefix "" platform {}
java.io.IOException: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: No workers exist for instance name prefix "" platform {}
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:241)
at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1490)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:286)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:245)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:127)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:259)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:156)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:116)
at com.google.devtools.build.lib.exec.SpawnStrategyResolver.exec(SpawnStrategyResolver.java:45)
at com.google.devtools.build.lib.bazel.coverage.CoverageReportActionBuilder$CoverageReportAction.execute(CoverageReportActionBuilder.java:140)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.executeAction(SkyframeActionExecutor.java:1135)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1052)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:165)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:94)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:553)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:852)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:331)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:461)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:414)
at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1407)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: No workers exist for instance name prefix "" platform {}
at io.grpc.Status.asRuntimeException(Status.java:535)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:175)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:245)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:127)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:116)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$3(GrpcRemoteExecutor.java:146)
at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:526)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:144)
... 26 more
Target //python:trivial_test up-to-date:
bazel-bin/python/trivial_test
INFO: Elapsed time: 248.538s, Critical Path: 119.85s
INFO: 15 processes: 3 remote cache hit, 11 internal, 1 remote.
ERROR: Build did NOT complete successfully
//python:trivial_test PASSED in 1.4s
XXX/execroot/federation_example/bazel-out/k8-fastbuild/testlogs/python/trivial_test/coverage.dat
Executed 1 out of 1 test: 1 test passes.
All tests passed but there were other errors during the build.
Note: with this PR; setting remote_execution_properties
causes the following NPE:
$ bazel-with-our-rbe -s --combined_report=lcov //python:trivial_test
Running host JVM under debugger (listening on TCP port 5005).
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server (72354) after 10 seconds ...
INFO: Invocation ID: ee7ac809-1ca1-4904-82d4-e767829cb641
INFO: Using default value for --instrumentation_filter: "^//python[/:]".
INFO: Override the above default with --instrumentation_filter
ERROR: XXX/external/our_rbe/common/platforms/BUILD.bazel:40:9: in exec_properties attribute of platform rule @our_rbe//common/platforms:ubi9-x86_64: Platform contains both remote_execution_properties and exec_properties. Prefer exec_properties over the deprecated remote_execution_properties.
ERROR: XXX/external/our_rbe/common/platforms/BUILD.bazel:40:9: Analysis of target '@our_rbe//common/platforms:ubi9-x86_64' failed
Analyzing: target //python: trivial_test (7 packages loaded, 12 targets configured)
[1 / 1] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:903)
at com.google.devtools.build.lib.skyframe.BuildResultListener.getHostPlatformInfo(BuildResultListener.java:122)
at com.google.devtools.build.lib.skyframe.SkyframeBuildView.analyzeAndExecuteTargets(SkyframeBuildView.java:732)
at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:293)
at com.google.devtools.build.lib.buildtool.AnalysisAndExecutionPhaseRunner.runAnalysisAndExecutionPhase(AnalysisAndExecutionPhaseRunner.java:241)
at com.google.devtools.build.lib.buildtool.AnalysisAndExecutionPhaseRunner.execute(AnalysisAndExecutionPhaseRunner.java:139)
at com.google.devtools.build.lib.buildtool.BuildTool.buildTargetsWithMergedAnalysisExecution(BuildTool.java:305)
at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:173)
at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:510)
at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:478)
at com.google.devtools.build.lib.runtime.commands.TestCommand.doTest(TestCommand.java:163)
at com.google.devtools.build.lib.runtime.commands.TestCommand.exec(TestCommand.java:116)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:664)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:244)
at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:621)
at io.grpc.Context$1.run(Context.java:566)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
I can look into cleaning up the PR after yours has been merged, but I may not get to it this month.
This is not configurability.
Description of the bug:
I am trying to get coverage to work in our RBE cluster.
Collecting the coverage works locally with:
Running the same with RBE:
I am puzzled by the
Configuration: system
bit. IIUC the CoverageReportAction should spawn a BasicSpawn with a configuration. In our RBE implementation this means we need to manually specify the exec properties using--remote_default_platform_properties
to get this to work remotely. We would like to avoid usingremote_default_platform_properties
as a workaround given this probably creates situations where if a spawn does not have a platform set, it will run them incorrectly.Is this the expected behavior or is there an oversight is setting the configuration for the
CoverageReportAction
? It feels like either the BasicSpawn needs to inherit a platform (e.g. from the host platform or the default exec platform), or there needs to be a way to specify the coverage action's flag (e.g. a --coverage_platform flat or similar)?Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Use rules_python and a wrapper for unittest to get XML outputs.
Which operating system are you running Bazel on?
macOS 12.6.3
What is the output of
bazel info release
?release 6.3.2
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
I looked for
coverage configuration
to no avail.Any other information, logs, or outputs that you want to share?
No response