bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.88k stars 4.01k forks source link

Feature request: store/load analysis cache on disk #11194

Open iphydf opened 4 years ago

iphydf commented 4 years ago

Description of the feature request:

I'd like to be able to persist the analysis cache (e.g. after running bazel aquery //...) to disk and load it back into the Bazel server on next startup.

Feature requests: what underlying problem are you trying to solve with this feature?

I have a lot of small git repos that have independent build systems (cmake, cabal, autotools, make, setup.py, ...), but are also part of a larger "stack" monorepo (except it's not mono, it's git submodules). This contains the WORKSPACE and third_party and various other bits.

I have a Docker image in the top level "stack" repo that downloads large third_party dependencies such as Android NDK/SDK, some .jar files, Qt, some external dependencies that aren't yet built with Bazel, and installs various other programs needed to run the build (mvn, make). Essentially it tries to do as much as possible of the build preparation right up to the point of building (which it doesn't do).

The idea here is that submodule repos can use this working snapshot (which is tested in its entirety on CI before the image is built) to run their own tests within the context of the large repo, so all their dependencies are there and at the most recent stable snapshot.

This is working pretty well, especially with remote caching, but now a large proportion of the time spent on submodule CI runs is in the analysis phase.

I'd like to be able to dump the analysis cache to disk and load it in the submodule CI builds. Of course the flags should be saved as well, so that if the submodule CI uses incompatible flags that require the analysis cache to be cleared, that will happen as normal. It'll be up to the submodule to ensure its flags are the same as in the Docker image.

What operating system are you running Bazel on?

GNU/Linux Ubuntu 16.04 (l.gcr.io/google/bazel:3.0.0).

What's the output of bazel info release?

release 3.0.0

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

github-iphy:iphydf/toktok-stack
017559a8ff5644dade5a3909417b2357133dd288
546bffa99618672417cb5106ff1147e974693755

Have you found anything relevant by searching the web?

Nothing useful.

janakdr commented 3 years ago

We have some functionality internally to serialize analysis objects like this, but performance and serialized size would probably not be good enough for this to be a useful feature at present, especially if the use case is for clean builds, where the full analysis graph would have to be materialized in memory. Our use case is focused on individual targets, which means that there's duplication of shared objects in the graph.

or-shachar commented 3 years ago

We (Wix) have the need for it as well. We also have ~ 50 big repositories that are interconnected and we want to implement smart cross repo trigger - eg: given a file was change in repo x, which targets in all other repos depends on it (rdeps) --> how many and which builds to trigger in the pre/post-submit.

We would like to persist the analysis cache into some kind of graph DB so we can run quick queries about it later on.

cc: @avgar @anchlovi @adiko @shays10

anchlovi commented 3 years ago

@or-shachar we already have this data - given a target you can query all the the targets to rebuilt. If needed this can be refined to extract only the repos

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

brentleyjones commented 1 year ago

Not stale. For some people analysis takes multiple minutes. It would be good to be able to reuse that analysis on restart of the server.

github-actions[bot] commented 1 month ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.