bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
23.13k stars 4.05k forks source link

allow repository_rules to run on stale state #23720

Open dgoldstein0 opened 3 weeks ago

dgoldstein0 commented 3 weeks ago

Description of the feature request:

repository_rules are a way to run arbitrary imperative operations during bazel's loading phase. The default is that bazel clears their output folder before each execution of the repository rule - a good default, but it makes some use cases challenging. I propose adding a parameter to repository_rule: clear_output_folder_before_fetch which should be True by default (maintaining current behavior), but could be set to False for use cases that trust their rules to properly handle stale state.

Which category does this issue belong to?

Rules API

What underlying problem are you trying to solve with this feature?

I've written a repository rule using yarn to install node_modules/ which we've integrated with our custom js library rules. Yarn itself has a ton of logic to handle incremental updating of the generated node_modules/, so that yarn install after a small change to the yarn.lock can take a few seconds rather than a minute or more. Since bazel clears the repository rule output folder, either we end up with quite bad performance, or we need quite a bit of extra complexity to put the outputs in a different folder and symlink them to bazel's output folder for the repository rule, so that we only have to create and rewrite symlinks on each repeat invocation and let yarn take care of the rest, as opposed to starting from scratch; (the symlink hacks cost us a few seconds of execution time - and also have caused us to encounter bugs in yarn - so aren't free either, but much better than no workaround)

Likely other "use a package manager or other build system in a repository rule" use cases have this problem as well.

Which operating system are you running Bazel on?

linux - ubuntu 22.04

What is the output of bazel info release?

release 6.1.0-1c2df03733503215490047c398d25fe5b5553a0e

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

mwoehlke-kitware commented 1 week ago

Likely other "use a package manager or other build system in a repository rule" use cases have this problem as well.

Can confirm; ran into this same problem with a repository that creates a Python virtual environment.

Also, repository rules sometimes run even when there has been no change. Why? (In particular, bazel shutdown triggers a re-run.)

Wyverald commented 1 week ago

Also, repository rules sometimes run even when there has been no change. Why? (In particular, bazel shutdown triggers a re-run.)

This sounds like a bug. If you could pin it down in a minimal repro case and file an issue, we'd appreciate it!

jwnimmer-tri commented 1 week ago

In @mwoehlke-kitware's case we were purposefully setting local = True for other reasons, in which case I believe re-running after a shutdown is the desired / specified behavior and there is no bug.

mwoehlke-kitware commented 1 week ago

In @mwoehlke-kitware's case we were purposefully setting local = True ...

Ah, I missed that note in the docs. Regardless, something like the proposed feature would still be useful for us!

I propose adding a parameter to repository_rule: clear_output_folder_before_fetch ...

I'd like to propose an alternative: allow specific files and/or trees to be marked such that they aren't purged when a refetch needs to occur. (Naturally, bazel clean or at least bazel clean --expunge should still nuke these.) This would be more flexible than the original proposal, but marking the whole output tree thusly should be equivalent to the original proposal.