Treat failure to resolve a toolchain as if it were an incompatible target for skipping in ... expansion.

aiuto commented 4 years ago

Incompatible target skipping will skip targets that can not be built for the specified destination platform. There is also a need to be able to skip targets which can not be built on the execution platform. For example, because a test might depend on an OS native tool which makes no sense on any other platform, or a compiler is only licensed to a few users in an organization.

It would be convenient if failure to resolve a toolchain (usually because one was not registered) as a skippable target.

TBD: Bring in more context from the mail thread which sprouted this idea.

cc: @gregestren @AustinSchuh @philsc @katre

philsc commented 3 years ago

I'm admittedly more of a toolchains newbie than I'd like to be. I'll do some reading to better understand what it means to have missing toolchain etc.

At a high level I can definitely see that being a desirable feature for the reasons you listed.

Quoting @aiuto from the e-mail chain:

Proposal: For ... resolution, failure to find a toolchain would be skip just like target_compatible_with. You would, however, get the warning that the toolchain was not found. If you try to build/test a specific target, however, it still fails with an error.

Pro:

easy to understand.

easier for the rule author to get the toolchain specs correct

no overloading target_compatible_with with failure to have the tools at the executable side.

Con:

if you had tests which required their own unique toolchain you may skip the tests on some platforms.

aiuto commented 3 years ago

The ' easier for the rule author to get the toolchain specs correct' requires some explanation.

In https://github.com/bazelbuild/rules_pkg/pull/254, I am wrapping a linux only tool (rpmbuild) in a toolchain. In order to get ... skipping to work on platforms where rpmbuild can not be found, I had to create a no-op toolchain which resolved, but contained an attribute saying it was invalid. I exposed the validity as a constraint which we could select() on in a target_compatible_with clause.

It is not a huge amount of work, but you have to see the "trick" to doing it that way, To me, that means it is too hard.

AustinSchuh commented 3 years ago

There's something to the idea to explore for sure. Here is another use cases that exists today:

Bazel can't guarentee that the android SDK/NDK is installed when it is being built. It has workspace entries which the user un-comments to enable the toolchains. If the user doesn't un-comment those toolchains, there is a build failure. -> This proposal would skip the android targets if there is no compiler available for android.

Honestly, that sounds pretty helpful. Likely that should have the same set of constraints that Phil added to incompatible target skipping. If the user requests it through //..., no error and print skipped. If the user explicitly requests an android target in that case, error out and complain.

aiuto commented 3 years ago

That's a perfect example. We see this all the time with mobile apps and a shared code base.

the iOS developers don't have any android SDKs installed
the Android developers don't have xcode installed
they both want to be able to say bazel test ... and see the tests for their platform.

philsc commented 3 years ago

While not helpful for resolving this issue, I did mention this issue in the Filtering incompatible targets in Starlark proposal. My naive thought is that skipping targets with a missing toolchain can also be expressed via a Starlark-accessible provider. That becomes relevant in that proposal.

gregestren commented 3 years ago

Would this work as a current workaround?

Define a default "empty" toolchain that's compatible with everything.
For rule implementations using that toolchain:

if toolchain="empty toolchain":
   return [IncompatiblePlatformProvider()]

philsc commented 3 years ago

@gregestren , the IncompatiblePlatformProvider class is currently not exposed to Starlark so that approach unfortunately doesn't work at the moment. However, I think that's a really neat, simple idea!

github-actions[bot] commented 1 year ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

aiuto commented 1 year ago

Not really stale. We should probably get this effect somehow.

philsc commented 1 year ago

@gregestren , do you have any cool ideas on a good way to accomplish this? I assumed it couldn't be too bad. I think I almost got there with this branch: https://github.com/bazelbuild/bazel/compare/master...philsc:bazel:unreviewed/philsc/fix-12419

But the test encounters this error:

$ bazel test -c opt //src/test/shell/integration:target_compatible_with_test --test_output=streamed --test_filter=test_incompatible_with_missing_toolchain
...
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.IllegalStateException: Unexpected exception: dep Dependency{label=//target_skipping:compiler_flag, configuration=1882c65bed561e30d863f6ecf00a6c3357d15ac54eca64e00aa2a71ff790ba77, aspects=AspectCollection{[]}, transitionKeys=[], executionPlatformLabel=null} had null value, even though there were no values missing in the initial fetch. That means it had an unexpected exception type (not ConfiguredValueCreationException)
        at com.google.devtools.build.lib.bugreport.BugReport.sendBugReport(BugReport.java:183)
        at com.google.devtools.build.lib.bugreport.BugReport.logUnexpected(BugReport.java:154)
        at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.resolveConfiguredTargetDependencies(PrerequisiteProducer.java:949)
        at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.computeDependencies(PrerequisiteProducer.java:740)
        at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.evaluate(PrerequisiteProducer.java:349)
        at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:203)
        at com.google.devtools.build.skyframe.ParallelEvaluator.bubbleErrorUp(ParallelEvaluator.java:422)
        at com.google.devtools.build.skyframe.ParallelEvaluator.waitForCompletionAndConstructResult(ParallelEvaluator.java:211)
        at com.google.devtools.build.skyframe.ParallelEvaluator.doMutatingEvaluation(ParallelEvaluator.java:177)
        at com.google.devtools.build.skyframe.ParallelEvaluator.eval(ParallelEvaluator.java:672)
        at com.google.devtools.build.skyframe.InMemoryMemoizingEvaluator.evaluate(InMemoryMemoizingEvaluator.java:177)
        at com.google.devtools.build.lib.skyframe.SkyframeExecutor.configureTargets(SkyframeExecutor.java:2276)
        at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:343)
        at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:440)
        at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:242)
        at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:140)
        at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:182)
        at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:529)
        at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:497)
        at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:103)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:625)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:240)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
        at io.grpc.Context$1.run(Context.java:566)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
------------------------------------------------------------------------
test_incompatible_with_missing_toolchain FAILED: Bazel build failed unexpectedly..

Something is unhappy about me trying to propagate the ToolchainException up higher. It's looking for a ConfiguredValueCreationException instead.

Anyway, if you have any ideas, please let me know :)

gregestren commented 1 year ago

I'll need to refresh my memory on this. I'll schedule time for me and @katre and @aiuto to review the issue again.

katre commented 1 year ago

We chatted about this: the underlying idea that targets which fail toolchain resolution should be marked incompatible instead of failing the entire build makes sense. It's unclear to me now if that should always be true (I'd really prefer to not add Yet Another Flag), but it makes sense as a baseline.

@philsc Your change looks reasonable at a glance. You need to either wrap the ToolchainException in a ConfiguredValueCreationException (see an example) or handle it directly and not re-throw anything (possibly by returning the incompatible configured target).

I haven't done a deep review of the code, let me know when you have a PR ready.

philsc commented 1 year ago

Sounds good. Thanks @katre . I'm going to try a few things.

gregestren commented 1 year ago

Debugging more, I'm still unsure what's triggering that error. Two followup thoughts:

Note the expressed failing target is //target_skipping:compiler_flag. What happens if you modify the test to build that directly?
Does the error still occur if you catch the ToolchainException as your code already is and comment out the code that returns the incompatible provider? i.e. catch the exception and re-throw it just as the pre-existing code does, would that still trigger this error?

fmeum commented 1 year ago

Since the planned behavior is another situation in which an error (toolchain resolution failed) is effectively silenced (target is skipped), I would prefer to have this configurable.

@katre I formulated a concept for a similar setting that doesn't involve yet another flag here: https://github.com/bazelbuild/bazel/issues/18707#issuecomment-1609257988. I would be interested in hearing your thoughts on this.

philsc commented 1 year ago

I believe that whatever solution manifests for #18707 should solve the same concern for this, correct @fmeum ?

fmeum commented 1 year ago

@philsc Yes, I think so. The would ideally also be configurable on a per-repo level.

github-actions[bot] commented 2 months ago

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

AustinSchuh commented 2 months ago

Feels not stale to me.

bazelbuild / bazel

Treat failure to resolve a toolchain as if it were an incompatible target for skipping in ... expansion. #12419