Open meteorcloudy opened 1 month ago
Since rules_java and rules_python both register toolchains , I don't see a way to remove them short of getting rid of the dependency from bazel_tools
alltogether.
I think we can probably get rid of rules_python in MODULE.tools, I don't see any obvious reason why it should be there.
I think that if we remove it, projects that use py_binary
without a load
or dependency on rules_python
would no longer find a Python toolchain.
Ah, I see, thanks for pointing that out. Then, we can probably only do that in Bazel 8.
@katre @Wyverald As you can see, many dependencies are required only because they register toolchains. Should we consider introducing a way for the root module to exclude toolchains registered by modules in the dependency graph?
Maybe a simple
toolchain_override(
exclude = "*",
)
?
If the toolchains aren't needed, why is rules_java a dependency? Is it somehow referred to from @bazel_tools
? Do you have a code link I can check?
yes, rules_java is in MODULE.tools: https://cs.opensource.google/bazel/bazel/+/master:src/MODULE.tools;l=8?q=MODULE.tools&ss=bazel
And the toolchain is registered in rules_java's MODULE.bazel:
Okay, did a bit of testing.
The @rules_java
dependency is added in MODULE.tools, and referenced from a number of places, but most importantly from tools/jdk/BUILD.tools, which creates a number of aliases for legacy targets.
Of these, the most important seems to be @bazel_tools//tools/jdk:current_java_runtime
, which is itself referenced from a number of places, mostly from the native and builtin Java semantics (and also from the Android rules).
I see a lot of test failures by simply removing these, but it's very unclear which are actual failures, and which are tests that implicitly require Java to work but don't actually add a rules_java
dependency.
Trying to follow this further will take a lot of effort and cleanup (if nothing else, to make our integration tests be honest about what languages they require).
Oof. Do I understand correctly that the root cause is that Python/Java rules require the Python/Java toolchains which are registered in @rules_python
/ @rules_java
, therefore, if we don't depend on them by default, Python and Java rules cease working. The fix for that is to explicitly require a reference to the rules one needs but that can't happen before Bazel 8 because it would be a very incompatible change?
Owie, then this will require quite a bit of thought to figure out a reasonable (=can be done quickly and doesn't cause too much collateral damage) fix for...
And it is an entirely reasonable request that builds that don't use Python or Java don't depend on their respective rule sets, in fact, it's quite embarrassing that they currently do so...
It's possible that if the java rules are completely unused, then there is no need for rules_java, it's just very very hard to test that given the current state of things.
Setting up a small toy project and trying to build it with a suitably hacked up bazel is probably the best way to test this.
That, I hope, but it doesn't help by itself because there is no way to tell Bazel "I don't need @rules_java
, pinkie promise, please don't ever look at it", is there?
It's possible that if the java rules are completely unused, then there is no need for rules_java, it's just very very hard to test that given the current state of things.
You can use a local_path_override to override rules_java to an empty module, if the build still works, then rules_java isn't needed.
cc @comius -- this discussion might have some bearing on the eventual approach to decoupling rule implementations from the Bazel binary.
That's essentially what @meteorcloudy was asking for in https://github.com/bazelbuild/bazel/issues/22684#issuecomment-2160601261
Maybe a simple
toolchain_override( exclude = "*", )
I'm very ambivalent about this: I don't think it's a huge amount of work to update RegisteredToolchainsFunction
to handle it, but plumbing the new data through will be tricky (check the implementation of register_toolchains
for a comparison).
But it's very inelegant: it's much better to simply not depend on something you don't intend to use. If @bazel_tools
current depends on @rules_java
we're better off trying to break the dependency than just hiding it and papering over.
Relatedly, are we shipping any source tools as java_binary
targets?
IIRC, the fundamental problem is that repos defining those toolchains are not fetched lazily because we'll need to fetch the repo to know which toolchain type it is. If this problem is hard to solve in Bazel, then letting users decide which toolchains to be excluded would help avoid unnecessary external dependencies.
Relatedly, are we shipping any source tools as java_binary targets?
Probably not? /cc @hvadehra
I don't think there is much of a point in artifically cutting the dependency of a trivial C++ hello world on the rules_java repo. When the Java rules have been moved out of Bazel, this dependency will naturally be gone, but any non-trivial Bazel repo will depend on rules_java anyway.
For example, as soon as you run bazel coverage
on a trivial C++ example, a Java runtime will be downloaded to run the coverage merger, at which point the actual rules_java
repo is the least of your worries in terms of vendoring :-) Since ruleset repos are tiny and don't contain binaries, I don't think we need to worry about them too much.
I agree with Fabian here -- I think the natural solution of "just let Starlarkification run its course" is the best. When all java_*
rules are loaded from rules_java
, we can simply remove rules_java
from bazel_tools
, which would get rid of the toolchain registration for projects that don't need it. Trying to work around it earlier will likely just add to our tech debt.
Just as an aside: in case this wasn't clear before, we'll likely never be able to completely empty out bazel_tools
. Stuff like the built-in repo rules (http_archive
and co.), the shell runfiles library, and the default host platform (@bazel_tools//tools:host_platform
) will always need to remain bundled with Bazel, even after we've completed Starlarkification.
The point of putting in the legwork to make sure that a C++ hello world doesn't depend on Java is that then we know we don't have unexpected dependencies and that it's existence proof that it's possible to run Bazel builds that don't depend on @rules_java
/ @rules_python
.
I'm not too worried about that one unexpected dependency edge, but I am worried about the fact that vendoring @rules_java
is now pretty much a requirement for builds without Internet access. Firstly, it's a design smell, secondly, I could imagine some trouble brewing if @rules_java
suddenly decides have a mandatory dependncy on some large Java tool (or a JDK, etc.)
I do realize that @bazel_tools
will never be completely empty; I wanted to express myself concisely instead of precisely. I'm sure we all agree, though, that right now it has way more things in it than it should have.
We do already have tests that verify that cc_test
doesn't depend on certain unexpected tools: https://cs.opensource.google/bazel/bazel/+/0dbfaccaf5bee5ea7f11c01db1fc0cd1ca7f3810:src/test/shell/bazel/cc_integration_test.sh;l=1683.
We could extend this to ensure that there are no unexpected rules_java deps without getting rid of the toolchain registration edge.
Doing that via a bazel vendor
test would be both more effort and less granular (repo vs target level). If we also add a test to rules_java that detects undesired edges to other repos, we could even catch any accidentally eager fetches of toolchains early.
I think the problem we want to solve is this: For users who want to vendor external dependencies to achieve offline build, they don't have to vendor repos they don't need just because of toolchain registrations.
This is important for projects like Kleaf which has a strict policy on vendoring 3rd party dependency.
And it's still a problem after we starlarkifying rules_java and move it out of bazel_tools. For example, if kleaf depends on rules_foo, and rules_foo depends on rules_java because a binary tool needs it to provide some optional feature. However, Kleaf knows it doesn't want the optional feature and doesn't need Java toolchains anywhere else. Currently, there is no nice way to avoid vendoring rules_java and potentially other repos introudced during toolchain registration (e.g. bazel_features).
I think the solution might be to offer a way for the root module to exclude toolchain registrations from given modules. Something like
toolchain_exclude(
module_names = ["rules_java"],
# maybe also: exclude_all = True,
)
We do already have tests that verify that cc_test doesn't depend on certain unexpected tools: https://cs.opensource.google/bazel/bazel/+/0dbfaccaf5bee5ea7f11c01db1fc0cd1ca7f3810:src/test/shell/bazel/cc_integration_test.sh;l=1683.
This test doesn't guard what repos need to be fetched for building the target. For two reasons: 1) toolchain registration 2) irrelevant load statements. That's why we had to traverse the skyframe graph to discover all repos to vendor for given targets. Therefore, I think bazel vendor
is the only way to discover full repo dependencies for targets so far.
we'll likely never be able to completely empty out bazel_tools
I think the actual goal is to just make MODULE.tools empty instead of making the whole bazel_tools empty? So far, I only see bazel_dep(name = "buildozer", version = "7.1.2")
to be a problem, but it has little negative impact.
Yeah, I'm supportive of the idea of being able to say "to the best of knowledge, my builds don't need Java and please error out if it touches anything that touches Java".
Practically speaking, I could imagine a knob in MODULE.bazel that says "depend on these builtin language families" and if the user doesn't say anything, they get all of them. This would provide a convenient way to cut off rule implementations from Bazel if we eventually switch the default to "nothing".
I haven't thought the mechanics through, though. Maybe @comius knows with certainty whether this is a good idea?
Yeah, I'm supportive of the idea of being able to say "to the best of knowledge, my builds don't need Java and please error out if it touches anything that touches Java".
That should already be possible today: Add third_party/rules_java_stub/MODULE.bazel
with content module(name = "rules_java")
to your repo and then either add --override_module=%workspace%/third_party/rules_java_stub
to .bazelrc
or add a local_path_override
to MODULE.bazel
. If bazel vendor
doesn't skip overriden repos today, we could make it do so. Since the module file is empty, it won't register any toolchains and since there are no other files in the module, any references to @rules_java
that were missed will result in a "thing not found" error.
@fmeum Ah, that's a good point and it's basically what I suggested at https://github.com/bazelbuild/bazel/issues/22684#issuecomment-2160961838. I can confirm local_path_override
repo is already excluded from vendor.
Then toolchain_exclude
may not be that necessary. I'll suggest the Kleaf team to try out this first.
This could work if we always use --override_module in the command-line.
But unfortunately we can't put it as a default in .bazelrc or in MODULE.bazel local_path_override. This is because we also have stardoc() targets, contained within a package. When building these targets, we allow Internet access because stardoc pulls a bunch of maven dependencies, and it is a big hassle to pull them to the Android tree. So we only build these targets on developer's machines and regularly check-in the generated markdown files to the source tree. We do not build these stardoc() targets or even visit that package when in a contained environment (CI build machines without Internet access).
So, if we always override rules_java with a fake module, is there a way for us to offset the effect of --override_module when building stardoc? For example, can we make --override_module=rules_java= clear any previous values of --override_module for rules_java set in .bazelrc or in MODULE.bazel?
(on behalf of the Kleaf team)
Description of the bug:
C++ hello-world
Using C++ binary as an example.
To build a C++ hello-world binary offline
we'll have to vendor the following repositories with Bazel built at 96b90baf08172bb55844c2366111a0c39b6f9e14 or later.
While, the first four as acceptable, rules_java and rules_python should not be required by default for building a simple C++ hello world program.
The same can be apply to building hello-world program of other languages.
Java hello-world
Suggested solution
Which category does this issue belong to?
No response
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
https://github.com/meteorcloudy/minimal_vendor_test/tree/main
Which operating system are you running Bazel on?
macOS, Linux, Windows
What is the output of
bazel info release
?96b90baf08172bb55844c2366111a0c39b6f9e14 or later
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response