Open keith opened 1 year ago
@coeuvre @tjgq
Probably the test action hit the cache so some files are not downloaded. Might related https://github.com/bazelbuild/bazel/pull/16556.
Can you please provide an repro? Not exactly sure how to trigger this error.
I am also getting this warning on Bazel 6.1.0, but it doesn't seem to matter what values I specify for --remote_download_toplevel
or --experimental_remote_build_event_upload=minimal
. What's odder is that after the build completes, the baseline_coverage.dat
file that Bazel complains it couldn't find is present in the output tree.
I'm seeing this spam in rules_rust
builds on Bazel 6.2.1 for the following pull-request https://github.com/bazelbuild/rules_rust/pull/2001
@coeuvre does this provide you a sufficient repro?
https://buildkite.com/bazel/rules-rust-rustlang/builds/8715#0188aaec-43c1-4ad1-a9fe-62cdbf7d03e3
(14:48:09) WARNING: Uploading BEP referenced local files: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/ec321eb2cc2d0f8f91b676b6d4c66c29/execroot/rules_rust/bazel-out/k8-fastbuild/testlogs/test/unit/proc_macro/extern_flag_passed_when_compiling_macro_2018/baseline_coverage.dat (No such file or directory)
java.io.FileNotFoundException: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/ec321eb2cc2d0f8f91b676b6d4c66c29/execroot/rules_rust/bazel-out/k8-fastbuild/testlogs/test/unit/proc_macro/extern_flag_passed_when_compiling_macro_2018/baseline_coverage.dat (No such file or directory)
at com.google.devtools.build.lib.unix.NativePosixFiles.stat(Native Method)
at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:210)
at com.google.devtools.build.lib.unix.UnixFileSystem.stat(UnixFileSystem.java:200)
at com.google.devtools.build.lib.unix.UnixFileSystem.getFileSize(UnixFileSystem.java:391)
at com.google.devtools.build.lib.vfs.Path.getFileSize(Path.java:559)
at com.google.devtools.build.lib.remote.util.DigestUtil.compute(DigestUtil.java:61)
at com.google.devtools.build.lib.remote.ByteStreamBuildEventArtifactUploader.readPathMetadata(ByteStreamBuildEventArtifactUploader.java:194)
at com.google.devtools.build.lib.remote.ByteStreamBuildEventArtifactUploader.lambda$upload$7(ByteStreamBuildEventArtifactUploader.java:330)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableMap$MapSubscriber.onNext(FlowableMap.java:64)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.fastPath(FlowableFromIterable.java:185)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:129)
at io.reactivex.rxjava3.internal.subscribers.BasicFuseableSubscriber.request(BasicFuseableSubscriber.java:153)
at io.reactivex.rxjava3.internal.jdk8.FlowableCollectWithCollectorSingle$CollectorSingleObserver.onSubscribe(FlowableCollectWithCollectorSingle.java:102)
at io.reactivex.rxjava3.internal.subscribers.BasicFuseableSubscriber.onSubscribe(BasicFuseableSubscriber.java:67)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribe(FlowableFromIterable.java:69)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableFromIterable.subscribeActual(FlowableFromIterable.java:47)
at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
at io.reactivex.rxjava3.internal.operators.flowable.FlowableMap.subscribeActual(FlowableMap.java:38)
at io.reactivex.rxjava3.core.Flowable.subscribe(Flowable.java:15917)
at io.reactivex.rxjava3.internal.jdk8.FlowableCollectWithCollectorSingle.subscribeActual(FlowableCollectWithCollectorSingle.java:71)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleFlatMap.subscribeActual(SingleFlatMap.java:37)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:35)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleUsing.subscribeActual(SingleUsing.java:83)
at io.reactivex.rxjava3.core.Single.subscribe(Single.java:4855)
at io.reactivex.rxjava3.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:38)
at io.reactivex.rxjava3.internal.schedulers.ScheduledDirectTask.call(ScheduledDirectTask.java:25)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
@scentini do you know what would be needed to gain some traction here? These warnings are exploding logs in my projects.
Sorry for the delay. I will look into it.
I can reproduce this issue using rules_rust with Bazel 6.2.1, but can't reproduce it with Bazel@HEAD. The root cause is similar to the issue described in #17798. I will work on a fix for 6.3.0.
And this issue exists without using BwoB. It's just that the file baseline_coverage.dat
was not generated at the time when the BEP event was emitted.
cc @c-mita in case you are already aware of a fix for this issue.
I'm not aware of anything regarding this.
Does it complain about missing baseline_coverage.dat files for all transitive targets? Or just the top-level test targets?
Just the top-level targets. The missing baseline_coverage.dat
files are generated afterwards. My suspicion is there are missing deps between the BaselineCoverageAction
and CompletionFunction
so that the TargetCompletedEvent
was emitted without waiting for baseline_coverage.dat
to be generated. I am running a bisection to look which commit fixed this.
Bisection points me to 30cf799f58128d2922d3af59a6a907c4d8115952 which forces skyframe to use FJP. And if I rerun the repro with flag --experimental_use_fork_join_pool
using 6.2.1, guess what, the issue disappeared.
cc @meisterT
In conclusion, the issue is baseline_coverage.dat
hasn't been generated when TargetCompleteEvent
was emitted which references the file. Then the uploader tried to read it from local filesystem but failed. The baseline_coverage.dat
was generated afterwards.
The root cause is:
baseline_coverage.dat
is a toplevel artifact and is generated by BaselineCoverageAction
.TargetCompleteEvent
is created by CompletionFunction
which is executed after a test target had finished. The has a reference to baseline_coverage.dat
but the function doesn't check whether the file has been generated.BaselineCoverageAction
and CompletionFunction
currently, there is no guarantee for the existence of baseline_coverage.dat
when TargetCompleteEvent
is emitted.The fact that the issue cannot be reproduced at HEAD is because of flag --experimental_use_fork_join_pool
. I guess FJP silently changed how skyframe schedule actions and accidentally make all the BaselineCoverageAction
run before CompletionFunction
.
To mitigate the issue for 6.x users, we can simply cherry-pick commit 30cf799f58128d2922d3af59a6a907c4d8115952 into 6.3.0. A proper fix requires us somehow ask CompletionFunction
wait for the referenced baseline_coverage.dat
to be generated before emit the TargetCompleteEvent
.
Description of the bug:
We're seeing a log like:
For every single test target when we run them on CI with BES uploads and
--remote_download_toplevel --experimental_remote_build_event_upload=minimal
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I'm not sure, it's hard to debug with cache hits vs not
Which operating system are you running Bazel on?
macOS
What is the output of
bazel info release
?release 6.0.0rc4
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response