bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.75k stars 3.99k forks source link

Allow cross-package globbing #17790

Open tbaing opened 1 year ago

tbaing commented 1 year ago

Description of the feature request:

Please allow globs to operate across directories that contain Bazel BUILD files, rather than stopping at a BUILD file. In our specific context, we would like to use globs to include files in directories that contain Bazel BUILD files, and stopping at a BUILD file means we can't do that.

What underlying problem are you trying to solve with this feature?

ChromeOS currently creates a shadow hierarchy of symlinks with altered names to work around the fact that there are several categories of files that aren't currently valid in the source tree that Bazel is working over. These cases are:

We'd like to avoid the need for this shadow hierarchy, which will require several changes that are probably best captured independently since each one is self-contained in its function/implementation.

Which operating system are you running Bazel on?

gLinux

What is the output of bazel info release?

release 6.0.0-pre.20221012.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

lberki commented 1 year ago

Do I understand correctly that you want to filter out existing files that control Bazel from code you vendored so that you can provide your own?

Does "cross-repository" mean:

  1. Reaching across e.g. WORKSPACE.bazel files so that e.g. in a/BUILD.bazel you have glob(["**"]) it matches e.g. a/b/c even if a/WORKSPACE.bazel exists?
  2. Some sort of glob that affects multiple Bazel repositories (e.g. @x and @y)?
lberki commented 1 year ago

Would it help if you could customize the names of files Bazel reads (e.g. you could make it so that BUILD.bazel files are called BUILD.kitten in your source? (somehow)

This issue comes up every once in a while (see https://github.com/bazelbuild/bazel/issues/16707 ) but it has never quite crossed our pain threshold.

brandjon commented 1 year ago

Please clarify what cross-repository globbing would mean.

Is it just globbing as normal but not stopping at a WORKSPACE.bazel subdirectory boundary? If so, that seems like odd behavior, given that globbing stops at even BUILD.bazel subdir boundaries.

Maybe there's a bazelignore feature request somewhere in here, like the ability to ignore a sub-WORKSPACE at a known location?

lberki commented 1 year ago

Gentle ping @tbaing

tbaing commented 1 year ago

Sorry for the slow responses on this.

My original request was inaccurate because I'd misunderstood the problem we were encountering. We don't need the globbing to work across repositories, only the ability to not stop at BUILD.bazel (probably also WORKSPACE.bazel would be good, but BUILD.bazel is the primary need) subdir boundaries within a single repository. Sorry for the confusion I created from that inaccuracy. I'll edit to improve.

lberki@ is partially correct that we want to filter out existing files that control Bazel from code we vendored so that we can provide our own, but we also have ChromeOS-maintained code that might include Bazel build files so it's not all vendored third-party code. We're generating our own Bazel wrappers around each package and then we invoke the existing package-level build (which might or might not eventually call a child Bazel invocation), and we don't want to consider the BUILD files that might be present in that package-level code when applying a glob() over the code.

brandjon@'s idea about the ability to specify patterns for which locations should have their build files considered (or not) for globbing boundaries seems more in line with what I think would work best. Partitioning our "outer" BUILD files from our "inner" BUILD files by filename might work, but I think it would be awkward to control this based on file naming.

lberki commented 1 year ago

I hear you! Globbing across packages (and workspaces) seems to be something a lot of people want. Technically, it's easy to cook up cross-package globbing using a macro, glob() and subpackages(), but that requires changing the intermediate BUILD.bazel files, which, if I understand correctly, is not on the table this time.

I have been toying with the idea of making the name of BUILD.bazel files configurable since a good while; it'd be a big change, but it would be a pretty thorough solution to the problems of vendoring projects that have their own BUILD.bazel files but which don't quite work in the context of the outer project.

The best current workaround is to create a symlink tree with the undesired BUILD.bazel / WORKSPACE.bazel files edited out; from what I hear, that's what you are already doing? (that's what Android is doing, too)

lberki commented 1 year ago

cc @brandjon and @comius

lberki commented 1 year ago

re: UTF-8, what is not supported by Bazel? I do realize that we embarrassingly don't support a number of characters in file names, but those are <0x80. UTF-8 should work.

tbaing commented 1 year ago

Yep, the symlink tree approach is exactly how we're currently handling this. It works, but involves I/O it would be nice to avoid.

Re: UTF-8, my understanding is that we have a small number (O(tens to hundreds)) of files in third-party packages we control that have characters that aren't valid in Bazel names, so we create a shadow hierarchy with escaped names for any non-valid characters. At one point I wrote some Bash commands to find out what the invalid characters we use actually are, and I could recreate it if you need specifics.

lberki commented 1 year ago

Can you find out which files these are? My understanding was that Bazel supports every character in file names except : (colon), \ (backslash), 0x7f (DEL) and control characters 0x00 - 0x1F inclusive.

tbaing commented 1 year ago

Here are the non-[0-9a-zA-Z] characters we have in our file paths: !%&'()*+,-./:;=?@[\]_~🌐😀$áíőú

The input includes directory names (not just filenames) as well as the '/' path separator.

lberki commented 1 year ago

Ow. I was hoping that : and \ are not there, but they are, which means that we'll have to put our brains into gear and come up with something :(

I'm also completely disappointed by the lack of poop emojis there.

brandjon commented 1 year ago

@lberki I'll triage this to P4 for now but let me know if you'd like to prioritize.

lberki commented 1 year ago

As soon as @tbaing has a reasonable path forward, it's fine to keep this as P4. It's mighty embarrassing that Bazel cannot handle files with colons in their names, but as long as he can live with the status quo, there is no pressure to do something about this right now.

In the longer term, what I expect is that once "proper' Unicode support lands, we'll be able to start thinking about how to represent files with colons etc. in their names; @haxorz had a vague plan, but I didn't spend too much time evaluating it because it IIRC depended on proper Unicode support.

Then there will still be the problem of ignoring existing BUILD.bazel files and that's orthogonal from the question of supporting colons so let's tackle them separately?

github-actions[bot] commented 2 weeks ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.