Automatic generation of `skylark_library` targets in `BUILD` files

bazelbuild / bazel-skylib

Common useful functions and rules for Bazel

https://bazel.build/

Apache License 2.0

394 stars 180 forks source link

Automatic generation of `skylark_library` targets in `BUILD` files #250

Closed achew22 closed 1 year ago

achew22 commented 4 years ago

Hey @aiuto and @laurentlb (who appear to be OWNERS on this repo),

First, thanks so much for all you're doing in the Bazel ecosystem. I hugely appreciate your efforts. One of the things that I have been thinking about creating for a while was a bazel-gazelle plugin for skylib that generated skylark_library targets. I think this would be generally useful. In my repo's I've found it helpful to have skylark_library targets to feed into stardoc and manually creating the targets is kind of a drag.

Fortunately, Gazelle has a really nice plugin system that can be used to add generation for languages beyond golang. If I were to write a skylib plugin for Gazelle, would this repo be a good place to host it? Since the repo is packaged by rules_pkg I could iterate on it and not have it released as part of the distribution bundle until it was deemed to be good enough. Additionally I would be happy to update the distribution scripts to support both a gazelle version and a non-gazelle version to keep down dependencies if that's a concern.

Please let me know what you think. Thanks so much!

aiuto commented 4 years ago

cc: @jayconrod
I really don't know where that would belong. Do gazelle plugins go with a gazelle contrib folder or where they seem used?

Then we have the question of how to bundle libraries for the optimal stardoc generation. One bzl_library per file is probably not wrong. If startdoc is going to do something for integrated doc sets, that could be structured at the stardoc rule level, rather than having a big library.

jayconrod commented 4 years ago

Ideally I think Gazelle extensions should live in the same repos where the rules are defined. That way, the extension can be coupled with the rules being used. Rule authors are also in a much better position to maintain those extensions than I am.

From the not-following-my-own-advice department, the Go extension lives in the Gazelle repository (Gazelle was originally Go-only). There have been some issues with Gazelle and rules_go versions drifting apart.

achew22 commented 4 years ago

In order to push the discussion along, I decided to implement a strawman proposal. Please see #251 for that proposal. I have a few questions in there for both of you. If you would be so kind as to review and respond to them I would greatly appreciate it.

aiuto commented 4 years ago

To be honest, I am not going to have the time to look at this for a while. Nor can I commit to approving, because I am not directly a skylib maintainer. @laurentlb @c-parsons Can either of you comment on the overall concept of doing this.

achew22 commented 4 years ago

Friendly ping

Wyverald commented 3 years ago

Hi @achew22, I'm many months late to the party, but I wanted to ask if it's possible to move this to a separate repo? As you alluded to yourself, the added dependency on rules_go and bazel_gazelle could be problematic. It becomes much more conspicuous in the external deps overhaul (tl;dr doc) since a module needs to declare its direct dependencies and all of those propagate, which means that anyone depending on Skylib would necessarily bring in rules_go and bazel_gazelle. (In fact, that's why I'm only just asking now -- I'm trying to kickstart the Bazel central registry described in the overhaul proposal.)

@jayconrod re "Gazelle extensions should live in the same repos where the rules are defined", I would agree except that it brings a dependency on Go/Gazelle to any rules_X module for which such an extension exists. Is it feasible to always have such extensions in a standalone repo (read: standalone module, as described in the overhaul proposal)?

Wyverald commented 3 years ago

I also wanted to say for the record that I'm not entirely clear on the details yet, so please don't treat this as an eviction notice :)

How exactly does one use the gazelle plugin in this repo? I haven't used Gazelle before and the readme file here doesn't contain instructions. Depending on the answer to that question, we might be able to avoid this move by simply re-introducing the concept of dev-dependencies into the external deps overhaul proposal.

achew22 commented 3 years ago

Hi @achew22, I'm many months late to the party,

@Wyverald, it's okay, we're just happy to see someone excited about the project.

but I wanted to ask if it's possible to move this to a separate repo? As you alluded to yourself, the added dependency on rules_go and bazel_gazelle could be problematic.

"Could be problematic" is an interesting turn of phrase. I would expect that by now it would have already become problematic given that 6 months and a number of releases have happened. It's already been integrated into most repos. Maybe you point me to some complaints from the past few months about adding this dependency? Looking at the data, my conclusion has to be that it is not problematic since not a single soul has complained in any forum that I can witness (with the notable exception of you). I'm open to rebuttal on this one.

It becomes much more conspicuous in the external deps overhaul (tl;dr doc) since a module needs to declare its direct dependencies and all of those propagate, which means that anyone depending on Skylib would necessarily bring in rules_go and bazel_gazelle. (In fact, that's why I'm only just asking now -- I'm trying to kickstart the Bazel central registry described in the overhaul proposal.)

Since this is the first instance of someone being concerned about this in the real world. Let's take this as an opportunity to discuss what it is you're doing and why the situation that's been in place for half a year is insufficient.

I also wanted to say for the record that I'm not entirely clear on the details yet, so please don't treat this as an eviction notice :)

Uh, I guess that's reassuring.

How exactly does one use the gazelle plugin in this repo? I haven't used Gazelle before and the readme file here doesn't contain instructions.

A usage example is actually included in this repo:

https://github.com/bazelbuild/bazel-skylib/blob/e19d391ac984cf941abfba9ea0506c49293eb934/gazelle/bzl/BUILD#L38-L48

The languages attribute points to a go_library target and is included in a built version of the resulting Gazelle binary. When you invoke that gazelle_binary, it will walk your filesystem and generate BUILD files for you. There is 0 overhead if you do not depend on the go_library.

Depending on the answer to that question, we might be able to avoid this move by simply re-introducing the concept of dev-dependencies into the external deps overhaul proposal.

I didn't realize the external deps overhaul had dropped the idea of dev-dependencies. I am concerned that is going to leave a lot of projects out in the cold. Most projects already depend on the ability to have deps that aren't a part of their distribution. For example:

having rules_go will force everyone to depend on https://github.com/grailbio/bazel-toolchain, as well as an RBE executor through the bazel-toolchains dep even though neither of those are necessary to use rules_go.
stardoc would require bringing in rules_pkg in its entirety which includes a system for both building and extracting Debian packages.
loading rules_nodejs will force everyone to depend on stardoc, rules_go, angular2 (which weighs 300MB), cypress (which weighs about 100MB) and an executable version of bazel itself in 15 different versions

This means that if you don't support the idea of having a differentiation between the distribution version of a repo and the developer version of a repo, a hello world in TypeScript is going to require you to download 1500MB of data.

Wyverald commented 3 years ago

what I'm trying to do

As described in the overhaul proposal, modules will specify their dependencies using MODULE.bazel files instead of WORKSPACE files, and deps will be pulled from registries (mostly the Bazel Central Registry or BCR) instead of raw URLs. I'm trying to put some initial modules into BCR, and that entails writing a MODULE.bazel file for them.

Skylib is the first one I tried. It being such a fundamental standard library, I expected it to have minimal dependencies, and was surprised that its WORKSPACE file contained references to rules_go and bazel_gazelle. They obviously need to go into the MODULE.bazel file as dependencies, which means that they'd propagate to basically all Bazel modules. Hence the original question. (see more discussion about dev-deps below)

As such, this is new territory and thus it's largely irrelevant whether anyone has complained in the past few months.

dev-dependencies

We dropped the idea of dev-dependencies because we couldn't cleanly support it. Imagine your module foo has a dev-dependency on the module fancy_test which you only use for testing. In one of your BUILD files (say foo/BUILD), you could write

load("@fancy_test//:defs.bzl", "fancy_test")

cc_library(name = "lib")
fancy_test(name = "lib_test", deps = [":lib"])

Now, if another module bar depends on foo (specifically @foo//:lib), they would get a build error at the load statement. This is because fancy_test is a dev-dep of foo, so bar doesn't transitively depend on fancy_test through foo. This error happens despite the fact that bar technically never used any test target in foo; it's simply because the package loaded something from the dev-dep. (see also https://github.com/bazelbuild/bazel/issues/12835)

So we ended up removing dev-deps because:

We thought this failure mode was rather tricky to explain; the user (bar) could get an error out of the blue, and it's hard for the library author (foo) to notice such potential breakages ahead of time (since they do get dev-deps).
Although one could work around this by putting all targets requiring a load from dev-deps into separate packages, it can cause quite a bit of churn as it goes against what most people are doing already (ie. put test targets next to the libraries under test).
The cost of not having dev-deps is, as you said, an explosion of unnecessary deps; however, this does not incur a crazy ton of extra downloads because Bazel doesn't immediately download all dependencies. A repo is only fetched when something is required from it. So things like rules_pkg would never be downloaded even without dev-deps, as long as the user of stardoc does not refer to any target in the stardoc package that loads rules_pkg. Similarly, a hello world in TypeScript does not require you to download 1.5GB of data.

With all that said, there's a convincing argument to bring dev-deps back, and that is the de-cluttering of the dependency graph. We essentially have to weigh that against the drawback of potentially tricky errors (bullet point 1 above).

why rules_go is not a dev-dependency in this case

Thanks for providing the usage example. Judging from it, I don't think rules_go is a dev-dep of bazel_skylib (even if we supported dev-deps), because you are expecting a user of bazel_skylib to use the @bazel_skylib//gazelle/bzl:bzl target, which is a go_library coming from the rules_go module. A dev-dep of bazel_skylib would be one that's only used for development of bazel_skylib itself (rules_pkg is a perfect example). Here rules_go is more of an "optional" dependency: if a user of bazel_skylib wants to use this part of the module, they should get this dep; if they don't, they shouldn't. In this case, it's much cleaner to separate out the part of bazel_skylib that brings in an extra dependency into a separate module.

It's worth noting that, we can make bazel_skylib flat out depend on rules_go, and someone who doesn't use the gazelle part of skylib still won't have to download rules_go. (Just like the no-dev-deps situation above.) The problem, of course, is that it clutters the dependency graph -- we can make skylib depend on 1000 other random modules without any users of skylib having to download anything extra (as long as they don't use those parts), but that doesn't make it justifiable.

cc @meteorcloudy

brandjon commented 3 years ago

IIUC, it sounds like the resolution here from skylib's POV is that we likely will want to put the gazelle dir in another workspace by giving it its own WORKSPACE file?

achew22 commented 3 years ago

@brandjon,

IIUC, it sounds like the resolution here from skylib's POV is that we likely will want to put the gazelle dir in another workspace by giving it its own WORKSPACE file?

One of the major benefits of having it in the same WORKSPACE is that when you upgrade your dependency on bazel-skylib you're implicitly upgrading your gazelle plugin. If the bazel-skylib were to add a feature that required a mechanical fix, the next run of gazelle would fix it right up. As far as I can tell, the only other strategy for helping people upgrade is to have them to use a buildozer fix. Which, of course, also has a dep on rules_go.

@Wyverald,

Does this imply that if you bring in bazelbuild/rules_proto, which I would argue is even more fundamental to the Bazel ecosystem than bazel-skylib, you're going to be bringing in rules_{cc,python,java} since it provides rules in each of those languages?

Is it possible there is an additional kind of dependency, and I'm throwing out a straw man here, an "optional" dependency. Let's pick a worse and maybe even pathological case, stackb/rules_proto which produces android, closure, cpp, csharp, d, dart, go, java, node, objc, php, python, ruby, rust, scala, swift, gogo, closure, commonjs, and typescript. If I'm a user of that repo am I required to include 20 dependencies? Possibly more since I'm bad at both transcription and counting. What if a repo could define a dep as "optional" and it's recorded as a possible dep and used in version compatibility calculations, but not automatically added into the deps section by projects that depend on it? This way you can have a repo that potentially provides a great number of resources but you pay only for the things that you need.

The cost of not having dev-deps is, as you said, an explosion of unnecessary deps; however, this does not incur a crazy ton of extra downloads because Bazel doesn't immediately download all dependencies. A repo is only fetched when something is required from it. So things like rules_pkg would never be downloaded even without dev-deps, as long as the user of stardoc does not refer to any target in the stardoc package that loads rules_pkg. Similarly, a hello world in TypeScript does not require you to download 1.5GB of data.

I guess I'm confused by this section. If you're saying that not having an edge upon these dependencies means that you don't actually download them, then what is the major concern? Slowing things down? Polluting the dependency graph? Can you help me understand what this a little better?

Wyverald commented 3 years ago

Sorry for missing the updates, I really need to learn how to use GitHub notifications.

IIUC, it sounds like the resolution here from skylib's POV is that we likely will want to put the gazelle dir in another workspace by giving it its own WORKSPACE file?

That's not enough, as we're talking about MODULE.bazel files here. The gazelle dir needs to be its own module, which means that it needs its own name, versions, and it needs to be able to be shipped separately (ie. I need to be able to download an archive containing its source, and preferably only its source).

I can see how that is a usability hurdle. Every Bazel module needing to be its own GitHub repo can be quite taxing. Maybe we can provide explicit support for using part of a VCS repo as a Bazel module; I can see how that could be possible (for example, the entire Git repo can be packed into one archive, but we can then use strip_prefix in source.json to say that only a certain subdirectory is the source for a module). Then, we'd only need a MODULE.bazel file under the gazelle directory, and it can comfortably be a separate module from the root directory (which would have its own MODULE.bazel file).

Does this imply that if you bring in bazelbuild/rules_proto, which I would argue is even more fundamental to the Bazel ecosystem than bazel-skylib, you're going to be bringing in rules_{cc,python,java} since it provides rules in each of those languages?

Yes indeed. Ideally the cc, python, and java parts would be separated out into distinct modules, but again, I can see how that can be annoying (see above for a potential solution).

Is it possible there is an additional kind of dependency, and I'm throwing out a straw man here, an "optional" dependency. Let's pick a worse and maybe even pathological case, stackb/rules_proto which produces android, closure, cpp, csharp, d, dart, go, java, node, objc, php, python, ruby, rust, scala, swift, gogo, closure, commonjs, and typescript. If I'm a user of that repo am I required to include 20 dependencies? Possibly more since I'm bad at both transcription and counting. What if a repo could define a dep as "optional" and it's recorded as a possible dep and used in version compatibility calculations, but not automatically added into the deps section by projects that depend on it? This way you can have a repo that potentially provides a great number of resources but you pay only for the things that you need.

Thanks for bringing up this example. I would definitely consider stackb/rules_proto a pathological case.

We have discussed optional dependencies for bzlmod before. There is prior art in Cargo, but it's not quite as simple as just marking certain dependencies as optional. A Cargo package A can have optional dependencies on B, C, and D, and they are grouped into "features" (say F1 which requires B and F2 which requires C and D), which dependents of A can then selectively ask for. On top of that, features are exposed as a Rust language concept, so they can enforce that code that uses something from B must be guarded under the feature F1.

As you can see this is something that warrants very careful design, in particular because it would need changes in Bazel (right now, there's no way for you to tell me in code that certain targets in your gazelle directory needs an optional dependency). Plus, it's the type of feature that can be added later on, and there is always the much cleaner solution of using separate modules. For those reasons, we decided to not introduce optional dependencies in v1.

The cost of not having dev-deps is, as you said, an explosion of unnecessary deps; however, this does not incur a crazy ton of extra downloads because Bazel doesn't immediately download all dependencies. A repo is only fetched when something is required from it. So things like rules_pkg would never be downloaded even without dev-deps, as long as the user of stardoc does not refer to any target in the stardoc package that loads rules_pkg. Similarly, a hello world in TypeScript does not require you to download 1.5GB of data.

I guess I'm confused by this section. If you're saying that not having an edge upon these dependencies means that you don't actually download them, then what is the major concern? Slowing things down? Polluting the dependency graph? Can you help me understand what this a little better?

First, to clarify the first sentence in the quoted section: by "the cost of not having dev-deps", I meant "the cost of not having the concept of dev-deps -- i.e. specifying all otherwise dev-only dependencies as normal dependencies". Having a normal dependnecy edge on rules_pkg does not cause you to fetch rules_pkg until something is needed from it. The downside of not having the dev-deps concept is then exactly as you named them: the pollution of the dependency graph and, consequently, the slowdown.

srikrsna-buf commented 2 years ago

Slightly off topic, I was trying to use the gazelle skylib extension using the following rule

load("@bazel_skylib//:bzl_library.bzl", "bzl_library")
load("@bazel_gazelle//:def.bzl", "DEFAULT_LANGUAGES", "gazelle", "gazelle_binary")

gazelle_binary(
    name = "gazelle-skylib",
    languages = DEFAULT_LANGUAGES + [
        "@bazel_skylib//gazelle/bzl:bzl",
    ],
    visibility = ["//:__pkg__"],
)

gazelle(
    name = "gazelle",
    gazelle = ":gazelle-skylib",
)

I get this error when I run bazel run //:gazelle,

no such package '@bazel_skylib//gazelle/bzl': BUILD file not found in directory 'gazelle/bzl' of external repository @bazel_skylib. Add a BUILD file to a directory to mark it as a package. and referenced by '//:gazelle-skylib'

I am using skylib version 1.1.1. Anything I am doing wrong?

achew22 commented 2 years ago

I'm not aware of any reason why that would not work. Rather than hijacking this thread, can you make a small repo that demonstrates the issue and file a new bug which includes a link to it? That will make it a lot easier to understand what's gone wrong. Thanks!

alexeagle commented 2 years ago

@srikrsna-buf is just hitting the same issue being discussed in this thread, which is that the distribution of bazel-skylib ships without the gazelle/ directory. (see target //:distribution in this repo)

As a workaround you can fetch bazel-skylib from source instead of using the distribution, but it doesn't work under bzlmod where you must trust the central registry (and MODULE.bazel happens before WORKSPACE so you cannot override).

@Wyverald any progress thinking through the problem of "optional deps"? As it stands, the rules authors SIG is recommending @achew22 's gazelle extension for new rules repos https://github.com/bazel-contrib/rules-template/blob/main/BUILD.bazel#L6 so this is making a bit of a mess for rules authors when they want to introduce bzlmod.

alexeagle commented 2 years ago

An approach I don't see on the thread above is to have the gazelle/ folder be a nested workspace and a separate Bazel module, but have it be released together with bazel-skylib. This is a pretty common approach in npm for example, where projects often release a bunch of packages together.

Granted, nothing enforces that users take the same version of bazel_skylib as bazel_skylib_gazelle (picking a name for the module for the sake of discussion) But they could extract a constant in either WORKSPACE or MODULE.bazel to get that assurance

versions = {"skylib": "1.1.1"}

bazel_dep(name = "bazel_skylib", version = versions["skylib"])
bazel_dep(name = "bazel_skylib_gazelle", version = versions["skylib"])

and we could document this as the suggested way to install.

Since it's a nested workspace, we still get the atomicity of CI covering the two together, avoiding drift and ensuring test coverage.

I'm adding a gazelle extension to our aspect-build/rules_ts right now and am planning to try this approach.

shs96c commented 2 years ago

The approach advocated by @alexeagle seems sensible to me. In contrib_rules_jvm we use the Gazelle plugin to help us keep our build files up to date. Because the plugin isn't available in the modularised skylib, we can't fully transition the project to bzlmod yet.

shs96c commented 2 years ago

Also, I note that the gazelle plugin is already in its own directory. That's what we do in contrib_rules_jvm, since we also have our own gazelle plugin (for generating build file entries from java source). To handle that case, we provide a different macro for loading all our dependencies. That means that in the common case, people don't need to pull in the dependency on rules_go, but when they want to use Gazelle it's an option.

I'm hoping that bzlmod will be smart enough to only load the repos that are actually used in a build, rather than everything.

meteorcloudy commented 2 years ago

I'm hoping that bzlmod will be smart enough to only load the repos that are actually used in a build, rather than everything.

That's already the case, if a repo is not required to build your target, it won't be fetched. Just like a repo defined in the WORKSPACE file. Just watch out this issue to avoid accidentally making rules_go a hard dependency.

shs96c commented 2 years ago

Alright. So it sounds like a module with a dependency on rules_go is fine. In which case, I think the "extra deps macro" provides the path of least resistance to getting the gazelle plugin in skylib shipped. Anyone fancy a PR?

meteorcloudy commented 2 years ago

So it sounds like a module with a dependency on rules_go is fine

Yes, but it does introduce the transitive dependencies of rules_go and affects the final resolved dependency graph though. But all dependencies are lazily fetched.

shs96c commented 2 years ago

I've opened #400 with the macro exposed in the distribution and the changes required to use it added to the README.

Wyverald commented 1 year ago

Closing; the gazelle subdirectory has been available as a separate module since 1.4.0.