Open ashi009 opened 2 months ago
would love this feature as well, we want to use docker sandbox to ensure local builds come out the same as remote builds. Unfortunately the docker sandbox is just too slow currently to be really viable.
This is already possible via the flag sandbox_add_mount_pair
This will use Docker's -v
option as seen here.
But docker sandbox is always copying the inputs, how does an extra mount point prevent the copy of inputs?
But docker sandbox is always copying the inputs, how does an extra mount point prevent the copy of inputs?
I thought the feature request was the following:
It should be possible to perform something like SymlinkedSandboxedSpawn, to mount inputs as readonly to the right location
What is this referring to exactly? Because this is what sandbox_add_mount_pair does.
Is the feature request maybe something that SymlinkedSandboxedSpawn
does not currently have? If an input matches any of the paths to be mounted, then skip any symlinking (or copying)?
I'm not the OP, but to me it seemed that this request was to make docker sandbox use a faster method for getting inputs into the docker container. The reference to symlinked sandbox seems like it suggesting that since the standard linux sandbox can do this, the docker sandbox should be able to do the same.
My use case may be a little different but also I am having a similar pain at the time it takes to run docker sandbox jobs. I am trying to use docker sandbox to guarantee local builds will provide the same results as remote execution builds. This will allow better reliability for uploading results to remote cache, as well as better reliability when using dynamic scheduling to guarantee local results match the remote results for example with the flags:
--internal_spawn_scheduler --dynamic_local_strategy=docker --spawn_strategy=dynamic
Currently the speed of docker sandbox jobs is too slow for me to leverage the above use cases. My build also has large number and size of inputs for jobs, we are looking at moving the more static inputs (such as the toolchain) into the containers, but that means we need to more often rebuild the containers when these inputs change.
Looking at the docs around docker sandbox, it is clear that it is intended to be used for debugging remote execution issues, however I think my use case seems like a good use of docker sandbox outside of debugging.
Description of the feature request:
When some heavy toolchains (eg. toolchains_llvm) are involved, using docker sandbox will experience significant performance degradation due to copying too many files around. For example, running a cc_library compilation action will copy 5GB of llvm toolchain into the sandbox exec root.
It should be possible to perform something like SymlinkedSandboxedSpawn, to mount inputs as readonly to the right location, and let docker to manage the rest.
Which category does this issue belong to?
No response
What underlying problem are you trying to solve with this feature?
Improve the performance of docker sandbox.
Which operating system are you running Bazel on?
macOS
What is the output of
bazel info release
?release 7.2.1
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
Attaching profiles for building openssl with rules_foreign_cc:
The overhead for setting up a sandbox for docker is 7~10s for each action.
Also, when running bazel in docker directly the subprocess run time is 100% faster than using docker sandbox (60,234.442 ms vs 131,298.141 ms). I think this is because docker sandbox mounts execroot from host to container instead of letting it uses the native fs. But I didn't push that far to profile the actual run time.