bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.1k stars 4.04k forks source link

Run docker sandbox without copying inputs #23294

Open ashi009 opened 2 months ago

ashi009 commented 2 months ago

Description of the feature request:

When some heavy toolchains (eg. toolchains_llvm) are involved, using docker sandbox will experience significant performance degradation due to copying too many files around. For example, running a cc_library compilation action will copy 5GB of llvm toolchain into the sandbox exec root.

It should be possible to perform something like SymlinkedSandboxedSpawn, to mount inputs as readonly to the right location, and let docker to manage the rest.

Which category does this issue belong to?

No response

What underlying problem are you trying to solve with this feature?

Improve the performance of docker sandbox.

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

release 7.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

Attaching profiles for building openssl with rules_foreign_cc:

The overhead for setting up a sandbox for docker is 7~10s for each action.

Also, when running bazel in docker directly the subprocess run time is 100% faster than using docker sandbox (60,234.442 ms vs 131,298.141 ms). I think this is because docker sandbox mounts execroot from host to container instead of letting it uses the native fs. But I didn't push that far to profile the actual run time.

dmoody256 commented 1 month ago

would love this feature as well, we want to use docker sandbox to ensure local builds come out the same as remote builds. Unfortunately the docker sandbox is just too slow currently to be really viable.

oquenchil commented 3 weeks ago

This is already possible via the flag sandbox_add_mount_pair

This will use Docker's -v option as seen here.

dmoody256 commented 5 days ago

But docker sandbox is always copying the inputs, how does an extra mount point prevent the copy of inputs?

oquenchil commented 5 days ago

But docker sandbox is always copying the inputs, how does an extra mount point prevent the copy of inputs?

I thought the feature request was the following:

It should be possible to perform something like SymlinkedSandboxedSpawn, to mount inputs as readonly to the right location

What is this referring to exactly? Because this is what sandbox_add_mount_pair does.

Is the feature request maybe something that SymlinkedSandboxedSpawn does not currently have? If an input matches any of the paths to be mounted, then skip any symlinking (or copying)?

dmoody256 commented 5 days ago

I'm not the OP, but to me it seemed that this request was to make docker sandbox use a faster method for getting inputs into the docker container. The reference to symlinked sandbox seems like it suggesting that since the standard linux sandbox can do this, the docker sandbox should be able to do the same.

My use case may be a little different but also I am having a similar pain at the time it takes to run docker sandbox jobs. I am trying to use docker sandbox to guarantee local builds will provide the same results as remote execution builds. This will allow better reliability for uploading results to remote cache, as well as better reliability when using dynamic scheduling to guarantee local results match the remote results for example with the flags:

--internal_spawn_scheduler --dynamic_local_strategy=docker --spawn_strategy=dynamic

Currently the speed of docker sandbox jobs is too slow for me to leverage the above use cases. My build also has large number and size of inputs for jobs, we are looking at moving the more static inputs (such as the toolchain) into the containers, but that means we need to more often rebuild the containers when these inputs change.

Looking at the docs around docker sandbox, it is clear that it is intended to be used for debugging remote execution issues, however I think my use case seems like a good use of docker sandbox outside of debugging.