Indirect incompatible target skipping can have highly non-local silent effects

fmeum commented 1 year ago

Description of the bug:

During a bazel test //..., Bazel skips all targets that transitively depend on a target that is incompatible, as determined by target_compatible_with.

This can have far reaching implications: If any target in the transitive dependencies of a test has target_compatible_with added to it, this causes the test to be skipped on that platform. Even if a Bazel project doesn't use target_compatible_with itself, it may be affected by this without even being aware of the concept if an external dependency adds this specification.

Over at Jazzer, we recently accidentally and silently disabled our entire test suite on Windows by merging a change that added target_compatible_with somewhere far down in the dependency tree.

I could see guarding indirect incompatible target skipping behind a flag as a potential solution to this. Direct incompatible target skipping works very well and without surprises in all contexts I have used it so far.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Any

What is the output of `bazel info release`?

0573eee38d7d3b695267dfd125ab8e08d83a2640

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse master; git rev-parse HEAD` ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

fmeum commented 1 year ago

CC @gregestren @philsc

philsc commented 1 year ago

I could see guarding indirect incompatible target skipping behind a flag as a potential solution to this.

Could you elaborate on what you imagine the workflow to look like such a flag? I.e. what would your Windows breakage scenario have looked like if you had such a flag?

pcjanzen commented 1 year ago

Indirect skipping is super useful though: Without it, a repo would have to have target_compatible_with from top-to-bottom, and then it can be very challenging to figure out why a target is not being built, or to enable a top-level target when a low-level dependency becomes available.

Another potential solution might be to tag your top-level targets to indicate that they must build in a config, and then do

bazel build --nobuild $(bazel query 'attr(tags, windows_test, //...)')

as a separate step in your CI.

fmeum commented 1 year ago

Could you elaborate on what you imagine the workflow to look like such a flag? I.e. what would your Windows breakage scenario have looked like if you had such a flag?

If indirect incompatible target skipping wasn't enabled by default or could be disabled by a flag, then the PR that added the target_compatible_with deep down would have failed CI with an error pointing to the target that depended on the now incompatible target. Instead, that target and all its dependents became incompatible implicitly.

Indirect skipping is super useful though: Without it, a repo would have to have target_compatible_with from top-to-bottom, and then it can be very challenging to figure out why a target is not being built, or to enable a top-level target when a low-level dependency becomes available.

I also find this feature really useful, but only if 1) the user is aware of it and 2) a large number of targets is incompatible. In our case most tests run on all platforms, only a few (Android-specific) components and tests don't.

Another potential solution might be to tag your top-level targets to indicate that they must build in a config, and then do bazel build --nobuild $(bazel query 'attr(tags, windows_test, //...)') as a separate step in your CI.

This is a good idea, but it's not feasible in our situation: We would have to tag essentially all tests except for a very small subset. I would also consider having to use tags rather than the more descriptive target_compatible_with a step back.

Especially now that the BCR exists and is turning into a central collection of build files for common C++ projects, I can see target_compatible_with becoming more popular in external dependencies. But this would make the current default of implicitly marking dependents incompatible a more common pitfall.

Given that users benefitting from the current default behavior

1) likely already use target_compatible_with in their own code base and thus are aware of its existence and 2) probably rely on custom tagging anyway to handle the cases of top-level targets that should always be compatible,

I wonder whether flipping the default would improve the situation: Everyone relying on the implicit skipping would just need to add the flag to their .bazelrc, while everyone else could benefit from the nice error messages provided by explicit target_compatible_with annotations on external deps without the risk of wildcards not behaving as expected.

@pcjanzen @philsc What do you think of having such a flag (and flipping it)?

gregestren commented 1 year ago

FYI @trybka

gregestren commented 1 year ago

I agree with the above.

It's ultimately a user education issue. Bazel is capable of sending the user signal: "you have an incompatibility for reason X". Users have to intelligently interpret those signals. It seems safer to start with a stronger signal you can't ignore (error), then let users downgrade to warnings / silents skips once they've processed and acknowledged the issue.

I wish this could be totally automated but I don't see how.

pcjanzen commented 1 year ago

This is a good idea, but it's not feasible in our situation: We would have to tag essentially all tests except for a very small subset. I would also consider having to use tags rather than the more descriptive target_compatible_with a step back.

Or you can tag the very small subset. If you have some discipline about how you write your target_compatible_with attribute value, you might be able to query or cquery it directly, instead of using a tag.

What do you think of having such a flag (and flipping it)?

I am but a mere bazel user, so my opinion is worth exactly what you paid for it. However I will say that I'm quite leery of flags that change the semantics of BUILD files across the entire repo: the probability that none of my third party dependencies will be strongly-opinionated about how I set these flags, rapidly approaches zero as the number of third-party dependencies increases.

fmeum commented 1 year ago

I am but a mere bazel user, so my opinion is worth exactly what you paid for it. However I will say that I'm quite leery of flags that change the semantics of BUILD files across the entire repo: the probability that none of my third party dependencies will be strongly-opinionated about how I set these flags, rapidly approaches zero as the number of third-party dependencies increases.

That's a great point: The behavior would be even more useful if it weren't an "all or nothing" switch. In fact, it sounds like an excellent fit for a package parameter, which would still allow for a repo-wide default via @Wyverald's REPO.bazel effort.

More concretely, we could have a skip_implicitly_incompatible_targets parameter on package and repo, with the global default being False. A target in package P would be skipped as implicitly incompatible if the effective value of that setting is True for P and it (transitively) depends on an explicitly incompatible target.

A repo default of True coupled with a few packages defaulting to False could potentially even replace the manual tagging mechanism.

@gregestren Would you say that adding such functionality to package could be acceptable?

katre commented 1 year ago

There are a lot of complexities with adding new attributes like this to package. I'm not convinced that it would be worth the effort, but I'll let @gregestren weigh in on that.

fmeum commented 1 year ago

@katre Could you elaborate a bit on these complexities? Having this configurable on repo only could be good enough if adding anything to package isn't feasible.

Now that I have given this knob more thought, I think that it would be more useful to configure whether targets in a given repo/package inherit incompatibility from their direct dependencies (and toolchains, following https://github.com/bazelbuild/bazel/issues/12419) rather than directly configuring whether implicitly incompatible targets are skipped. This would allow every repo to use its own style of applying target_compatible_with internally while keeping the implications for external users consistent.

katre commented 1 year ago

The specific problem is with package's default_foo attributes (which may not be what you're suggesting, but also may). Consider this BUILD file:

cc_library(name = "before")
package(default_copts = "-DWHATEVER")
cc_library(name = "after")

The question is, what is the value of copts for both :before and :after?

The answer is that Bazel (and other google-internal tools based on Bazel) have different answers at different times. Many google-internal users of a similar pattern assume that :before would have empty copts, and only:afterwould be affected bypackage`. Some of our tools (including Bazel) treat the package as affecting everything in the package (originally due to lazy evaluation, but this is also a principled approach).

@sdtwigg is working on forcing package to always be the first statement in the BUILD file, to avoid this, but it's still potentially confusing to users, and I'm not sure @sdtwigg's work will be submittable anytime soon.

Using the same analysis of your suggestion, would a user assume that only targets defined after a package that changes how incompatible targets are handled would be affected?

fmeum commented 1 year ago

I have to say that I have never seen a package call that isn't at the top of a BUILD file "in the wild". I would personally expect package to affect all targets in the file regardless of position, but I would also consider targets interleaved with package calls confusing no matter which interpretation is applied.

With the situation being this complex (at least until @sdtwigg's cleanup lands), only supporting this attribute on repo for now seems sensible.

philsc commented 1 year ago

As long as it's possible for projects to co-exist, I think a flag would work. E.g. project A has the flag flipped one way and also depends on project B. Meanwhile, project B has the flag flipped the other way and also depends on project C. Project C has the flag flipped again.... etc. etc.

I.e. as long as all the projects get to decide which way they want the flag to work without forcing downstream projects to adapt/change, then that sounds fine.

fmeum commented 1 year ago

Sounds good. I will give this a try once REPO.bazel is available.

stagnation commented 1 year ago

Thanks for raising this, we have seen the same problem where entire platform tests are silently dropped. And while a separate cquery can be used to look for incompatible targets, and verify that all important targets are still run, the net effect is that we avoid tagging anything except tools and toolchains with incompatibility.

philsc commented 1 year ago

Going back to the command line flag idea: I believe projects with the flag set to different values should be able to coexist. Or at least I don't see why they couldn't.

E.g. If project A's dependencies (say, B & C) have the flag set to use indirectly incompatible targets, but A doesn't, then you still have to mark all of A's targets. Just like you would have to if A didn't have any dependencies on B and C.

On the other hand, if B & C don't use indirectly incompatible targets (and mark all the targets), then project A can still use indirectly incompatible targets because any incompatible target will just be skipped.

But it does mean that you wouldn't be able to do something like bazel build @B//... @C//.... Hmm. I guess that means that a command line flag won't really work. A REPO.bazel approach sounds more appropriate.

Apologies for the noise. I found it useful to write that out.

gregestren commented 1 year ago

I think this issue deserves a design proposal (with clear proposed syntax).

There are a lot of subtle and important points being made and IMO the core themes are getting scattered through a long comment thread.

I'm open-minded but not yet convinced we need this per-project tuning. I don't understand well enough the specific scenarios that protects against. And I worry about the conceptual complexity that adds. I get the idea that third-party deps can be strongly opinionated in unknown ways. But I don't clearly see how this manifests for this particular functionality.

Another value I've heard particularly from Google's CI is the CI system wanting to understand and make decisions about platform compatibility without having to traverse the build graph. The idea is the CI system can cheaply figure out what resources it needs to allocate based on expected load of each target. I don't know if this would apply to other CI systems or other Bazel consumers, but it's a value I'd at least like a reference to in this discussion.

fmeum commented 1 year ago

I drafted a design doc: https://docs.google.com/document/d/127pgkR610qaz1k3-OVgt3tbEeMKAm_jFNeRQcUc99d0/edit?usp=sharing

I will submit it soon, with @gregestren and @philsc as reviewers. Looking forward to feedback, also from anyone else in this thread!

fmeum commented 9 months ago

One more idea I had: Make it so that a target with target_compatible_with set explicitly fails analysis if it is compatible according to this attribute but transitively depends on a target that is incompatible. This makes it possible to mark arbitrary, especially top-level targets with what their compatibility should be.

pcjanzen commented 9 months ago

That's essentially what I was getting at here: https://docs.google.com/document/d/127pgkR610qaz1k3-OVgt3tbEeMKAm_jFNeRQcUc99d0/edit?disco=AAAA2Q-lZF8

although I stopped short of proposing an implementation.

One way could be

    tags = ["no-transitive-incompatible"],

which would cause analysis to fail if this target's target_compatible_with matches the current configuration, but one or more of its deps did not. This tag could be set on "top-level" targets.

Or you could consider two tags: "allow-transitive-incompatible" and "disallow-transitive-incompatible", and then introduce some command line switch to set the default for targets which have neither tag.

(I don't especially love magical tags, but I can't think of another way to accomplish this in a backwards-compatible way.)

fmeum commented 9 months ago

Sorry for not noticing that connection, it does amount to the same semantics.

I would find a backwards incompatible change in Bazel 8, guarded behind a flag cherry-picked into Bazel 7, preferable to a solution relying on "magic tags". Do you expect migration with such a flag to be painful?

stagnation commented 9 months ago

Nice, I like that we could just disallow it entirely, and tag the targets that should allow propagation. The risk of silently dropping big parts of the build tree makes me uneasy. I would prefer to just do a big rdeps and buildozer command to also tag upstream targets with the incompatibility introduced by a base library (if it was warranted). To set the allow-transitive-incompatible tag, or just set the same platform directly, and comment as to why.

In my experience this has more often been mistakes that should be avoided rather than a core part of the dependency tree structure. So it is good that the proposed flag can be opt-in or opt-out! This would save the hassle of a big cquery/ies to verify that all targets are buildable on the desired platforms, and that the target incompatible providers does not suddenly pop-up where it is not expected.

I marked a library in my example repo as windows-only, and this buildozer invocation can mark everything explicitly, which would make the full scope and consequences evident in code review.

    $ buildozer '
          set target_compatible_with @platforms//os:windows
      ' '
          comment target_compatible_with @platforms//os:windows from\ //Library:Library
      ' \
              $(bazel query --output=label \
              'rdeps(//Library:Library, //...)
               except //Library:Library' )

gregestren commented 9 months ago

This reminds me that we're starting to explore the idea of a more first-class concept of "project". Maybe oriented around bzl modules? Maybe separate? Some of the transitive logic discussion here reminds me of discussions about that. Lukacs, for example, wrote up an interesting doc about stronger reasoning about transitive deps & rdeps based around projects (not sure if he's shared it).

fmeum commented 3 months ago

I am planning to reboot this effort with a new approach based on @pcjanzen's idea in https://github.com/bazelbuild/bazel/issues/18707#issuecomment-1830951804. Please take a look at https://github.com/bazelbuild/bazel/discussions/19200#discussioncomment-9688601.

bazelbuild / bazel