Open lberki opened 2 years ago
Yes, please. It is known to be particularly slow on MacOS:
https://github.com/bazelbuild/bazel/issues/8230
In my (albeit limited experience), the biggest barrier to integrating Bazel into local developer workflows (where quick feedback is paramount) is the sandboxing time, which in many cases far outstrips action exec time.
Using one symlink per large tree artifact instead of symlinking each file in it separately
This one would really help for Apple platform builds where they produce bundles of bundles.
cc @larsrc-google
@lberki There is --experimental_reuse_sandbox_directories
- have you tried that?
--experimental_reuse_sandbox_directories
is essentially your point #5, and it helped a lot. #3 (tree artifacts) does sound reasonable. The others I'd like to have some measurements for first, to see how much we can actually save.
Learned today: https://github.com/ikorennoy/jasyncfio , io_uring in Java (I'm not sure if it's useful and it'd be an extra dependency, but I don't want this nugget of data to get lost)
We'll need some reproducible examples. I tried compiling Bazel itself with and without sandbox in various worker/non-worker configurations, and the difference was minimal.
Did you try synthetic loads? That would be a much easier avenue than testing on Chrome OS / Kleaf builds (AFAIU they have 300K/80K input files, but I don't know how many and how big TreeArtifacts there are in the former)
@lberki Can you share the build that actually triggers the slowness? From there we can work towards a more minimal repro.
Plussed @larsrc-google into the pertinent threads (unfortunately, they are Google internal communications even though they are about the interaction between two Google open source projects...)
Kleaf uses sandbox builds by default (though we also encourage developers to disable the costly sandboxes for local development). This feature will greatly improve the build time for Kleaf.
I can provide some metrics for the time spent on sandbox creation for Kleaf builds on build bots (ci.android.com) upon request (the data is public but the dashboard is internal only).
Ack. Numbers would be really helpful to aid in our prioritization decisions.
I am currently into potential improvements for the SymlinkedSandbox. In particular, I explore pushing more work batched together to JNI and to facilitate io_uring for I/O.
I looked a bunch at io_uring, but dropped it again when I head it had several bugs, including security-critical ones. Doing a batch API that can be implemented with JNI or io_uring or Loom threads would be good, though.
- Using one symlink per large tree artifact instead of symlinking each file in it separately
This one would have made the largest impact for ChromeOS.
😢
Description of the bug:
The symlinked sandbox is slow when there is a large number of input files (I have seen reports of actions with up to 300K)
There are a number of ways one could improve this:
SandboxHelpers
currently does this on one thread)What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
No response
What is the output of
bazel info release
?No response
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response