bazelbuild / bazel-skylib

Common useful functions and rules for Bazel
https://bazel.build/
Apache License 2.0
376 stars 179 forks source link

Add globbing to directory rules #511

Closed matts1 closed 2 months ago

matts1 commented 2 months ago

Please only review the second commit in this stack. The first commit will be submitted via a seperate PR (#510)

matts1 commented 2 months ago

@comius Continuing the discussion in https://github.com/bazelbuild/bazel-skylib/pull/510#discussion_r1601250531

It might be prone to more rebuilding (change one file and all the targets could be affected), but changes in prebuilt binary tools are probably unlikely.

I believe you mean re-analyzing, not rebuilding, but just double-checking for confirmation that this is correct (assuming, of course, that you filter out things you don't depend on properly, but that same filtering applies to the method you're proposing).

You could also address differences in directory structures among different platforms without introducing new selects.

I'm not aware of any easy way to do this. The mechanism you proposed above has one single BUILD.sysroot.bazel file, which means that if we wrote:

http_archive(name = "amd64_sysroot", build_file = "BUILD.sysroot.bazel")
http_archive(name = "ard64_sysroot", build_file = "BUILD.sysroot.bazel")

Then BUILD.bazel have the same contents in both sysroots, which means we can't vary the file contents (and thus can't easily address the differences in directory structure).

Of course, we could have different files for different sysroots, but then we're back at square 1 where we're now hand-maintaining a bunch of different configuration files and attempting to keep them roughly equivalent. There's also the middle ground where we define some things in each sysroot and some things in the global config file, but that's even harder to maintain because some cc_args are specific to certain sysroots, and some across sysroots.

If you can think of an easy way to maintain configuration files across a bunch of different platforms without the globbing I proposed, I'd be interested, but I'm not aware of any such solution.

Also, (tangentially related) I'm strongly of the belief that having multiple toolchains overcomplicates everything. Without the rule based toolchain, it's pretty much mandatory, since otherwise your toolchain would be more code than configuration. However, with a rule based toolchain, we can use select, which I believe serves the purpose much better for most (but not all) use cases.

Maintenance becomes so much easier when you have to only maintain a single toolchain, and toolchain resolution can confuse even experienced bazel users. One of the objectives I have here is to make all of ChromeOS use a single toolchain, which then uses selects judiciously.

I'm concerned about out of package globs

Could you elaborate on the specific concerns you had with it? You implied that we used to have a bunch of problems with them, with Filesets, and that I'd dealt with some, but not all of the concerns you had with them.

The vibe I'm getting is that your concern is about maintainence and simplicity. My belief is that that this will create an additional maintainence cost, but a relatively small one, and I believe that it will significantly simplify toolchain definitions downstream, so I think it's worth it.

comius commented 2 months ago

@comius Continuing the discussion in #510 (comment)

It might be prone to more rebuilding (change one file and all the targets could be affected), but changes in prebuilt binary tools are probably unlikely.

I believe you mean re-analyzing, not rebuilding, but just double-checking for confirmation that this is correct (assuming, of course, that you filter out things you don't depend on properly, but that same filtering applies to the method you're proposing).

Correct: re-analyzing. The analysis graph gets recomputed. Actions that stay the same are reused.

You could also address differences in directory structures among different platforms without introducing new selects.

I'm not aware of any easy way to do this. The mechanism you proposed above has one single BUILD.sysroot.bazel file, which means that if we wrote:

http_archive(name = "amd64_sysroot", build_file = "BUILD.sysroot.bazel")
http_archive(name = "ard64_sysroot", build_file = "BUILD.sysroot.bazel")

Then BUILD.bazel have the same contents in both sysroots, which means we can't vary the file contents (and thus can't easily address the differences in directory structure).

Of course, we could have different files for different sysroots, but then we're back at square 1 where we're now hand-maintaining a bunch of different configuration files and attempting to keep them roughly equivalent. There's also the middle ground where we define some things in each sysroot and some things in the global config file, but that's even harder to maintain because some cc_args are specific to certain sysroots, and some across sysroots.

If you can think of an easy way to maintain configuration files across a bunch of different platforms without the globbing I proposed, I'd be interested, but I'm not aware of any such solution.

I'm now convinced that you need some kind of globbing. I'd still consider a simpler design. Can you get away with filegroup and a new rule subfilegroup that does globbing?

# BUILD.sysroot.bazel
filegroup(name = "sysroot", srcs = glob(["**/*"], exclude=["BUILD.bazel"])
# toolchain/BUILD.bazel

alias(
    name = "sysroot",
    actual = select({
        "@platforms//arch:amd64": "@amd64_sysroot//:sysroot",
        "@platforms//arch:arm64": "@arm64_sysroot//:sysroot"
        # Other sysroots (eg. RISK-V)
    })
)

subfilegroup(
    name = "include_dir",
    filegroup = ":sysroot",
    srcs = ["usr/include/**"],
)

subfilegroup(
    name = "shared_libraries",
    filegroup = ":sysroot",
    srcs = ["lib/*.so", "lib64/*.so"]
)

Also, (tangentially related) I'm strongly of the belief that having multiple toolchains overcomplicates everything. Without the rule based toolchain, it's pretty much mandatory, since otherwise your toolchain would be more code than configuration. However, with a rule based toolchain, we can use select, which I believe serves the purpose much better for most (but not all) use cases.

I can see how this makes your life simpler. Bazel/Blaze rely a on the --platforms flag. I'm guessing you're still setting the flag, and doing do selects on the platforms (and/or constraint values)?

Maintenance becomes so much easier when you have to only maintain a single toolchain, and toolchain resolution can confuse even experienced bazel users. One of the objectives I have here is to make all of ChromeOS use a single toolchain, which then uses selects judiciously.

I wish we could fix that, so that users are not confused. Maybe we should print toolchain configuration as one big select?

If you have a monorepo-like control over all toolchains, then both mechanism are identical, select vs resolution.

I'm concerned about out of package globs

Could you elaborate on the specific concerns you had with it? You implied that we used to have a bunch of problems with them, with Filesets, and that I'd dealt with some, but not all of the concerns you had with them.

The vibe I'm getting is that your concern is about maintainence and simplicity. My belief is that that this will create an additional maintainence cost, but a relatively small one, and I believe that it will significantly simplify toolchain definitions downstream, so I think it's worth it.

Yes, that's right. I'd like to have the simplest design, that still simplifies toolchain definitions.