Open fmeum opened 4 months ago
cc @tjgq @coeuvre
Can you repro with --noexperimental_merged_skyframe_analysis_execution
?
so they are still missing on the next invocation.
Are these outputs should be downloaded (e.g. toplevel outputs, or specified by --remote_download_regex
)?
If Bazel marks an action dirty to trigger a download but then doesn't download it because it shouldn't, then it's a performance issue. Otherwise, it's a correctness issue.
Can you repro with
--noexperimental_merged_skyframe_analysis_execution
?
I can't.
Are these outputs should be downloaded (e.g. toplevel outputs, or specified by
--remote_download_regex
)?
No, these outputs should not be downloaded (just hjars, jdeps files, ...), so I think this is just a performance issue.
This is the root cause: https://github.com/bazelbuild/bazel/blob/43ad74bec433c1923e2ce78605ea04cac0cdb324/src/main/java/com/google/devtools/build/lib/buildtool/ExecutionTool.java#L328-L337
Note the RemoteArtifactChecker.IGNORE_ALL
, which means that all remote artifacts will be marked as dirty. Happy to work on this, are there any ideas of what would need to be done to improve this? CC @joeleba
skymeld calls skyframeExecutor.detectModifiedOutputFiles
when the first toplevel target is analyzed. However, the RemoteArtifactChecker
for BwoB needs full analyze result to be able to correctly determine which action should be marked as dirty because it records the path of toplevel outputs in a trie: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/remote/RemoteOutputChecker.java;l=128;drc=6f48f1c2b3bb73768b9ff15ce6698e21eddc503a (there are some skymeld only code, but is for clean build). Otherwise, actions for other toplevel targets will not be invalidated. So we decided to use RemoteArtifactChecker.IGNORE_ALL
for now.
One potential solution I have discussed with @joeleba offline is to let skymeld calls skyframeExecutor.detectModifiedOutputFiles
for each toplevel target so that the RemoteArtifactChecker.IGNORE_ALL
can be replaced with the one from BowB.
This is the root cause:
Argh, I intended to post this link but somehow my clipboard is corrupted.
This is a known limitation of Skymeld + BwoB. It's a tradeoff essentially: having both skymeld + bwob improves clean builds' performance, but comes with a performance penalty for incremental builds. Luckily the penalty only comes from repeating the skyframe work, since the have other layers of caches for actions.
In the current state, detectModifiedOutputFiles
is called once before any action execution in the build. At this point, it's not possible to construct the full picture of what's dirty or not in the RemoteArtifactChecker
(only available after the full analysis).
To resolve this, we would need to do detectModifiedOutputFiles
incrementally, as the information for each top level target/aspect becomes available.
We reproduced this with just the remote cache. Can this have a higher than P3 priority?
Raising it to P2 but we probably won't have time to work on the fix until next month or two. In the mean while, PR is welcome!
Friendly ping on this.
I have a proper fix and should be able to submit it in a few days.
@bazel-io fork 7.4.0
This is fixed both in master
and release-7.4.0
branches.
@brentleyjones Can you verify whether it improves your builds?
Thanks! I'll have to wait for an RC to start validation.
Description of the bug:
When using a disk cache (possibly also a remote cache), Bazel repeatedly marks output files obtained from the cache as dirty to trigger a download, but then doesn't download the files, so they are still missing on the next invocation.
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
baze --nosystem_rc --nohome_rc build --disk_cache=some_path //...
on some project with Java targets.See lines such as:
Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?No response
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?