Open nathaniel-brough opened 5 years ago
Re-assigning to team-rules-cpp for triage: this is strictly part of the setup of cc toolchains, and doesn't impact toolchain resolution at all.
cc/ @hlopko
I think with Crosstool in Starlark you cat get get quite far, and the last step can be simplified. I played with this a bit. I have a cc_toolchain_config
rule:
cc_toolchain_config = rule(
implementation = _impl,
attrs = {
"cpu": attr.string(mandatory = True),
"compiler": attr.string(),
"header": attr.label(allow_single_file = True, mandatory = True,),
"gcc": attr.label(allow_single_file = True, mandatory = True,),
},
provides = [CcToolchainConfigInfo],
)
And I instantiate this rule like this:
cc_toolchain_config(
name = "local",
cpu = "k8",
compiler = "compiler",
header = "@tools//:my/cool/header/header.h",
gcc = "@tools//:my/cool/gcc.sh",
)
Now this allows me for example to get the path of the package where header is for isystem:
include_directories = [ctx.file.header.owner.package]
This also allows me to get the path to the gcc no matter which external repository it is by:
ctx.file.gcc.path
The problem is that path are assumed to be relative to the package in which cc_toolchain_config
target is, as you correctly pointed out. Another problem is that paths must be normalized (https://github.com/bazelbuild/bazel/pull/5809) so you cannot walk up from the package to a different repository.
Is my understanding correct that solving this last problem is what this issue is about?
The problem is that path are assumed to be relative to the package in which cc_toolchain_config target is, as you correctly pointed out.
I believe that is the problem
Another problem is that paths must be normalized (#5809) so you cannot walk up from the package to a different repository.
I did try using un-normalized paths and failed running into the same problem as in the pull request you mentioned
To answer your question; yes, I do believe your understanding of the problem is correct
The Asylo project (github.com/google/asylo) has a similar use case. We have a toolchain in a repository @com_google_asylo_toolchain that inserts include paths to the @com_google_asylo workspace by duplicating the -I flags across asylo/path and external/com_google_asylo/asylo/path prefixes. The includes are to supplemental POSIX and system header files that trampoline into our runtime for actions like secure I/O and crypographically-secure randomness from /dev/urandom.
I ran into the relative path problem a year ago, and resolved it with a repository_rule that takes the absolute path to the toolchain installation directory as an attribute and then creates a symlink so that paths can stay relative https://github.com/google/asylo/blob/master/asylo/bazel/asylo_deps.bzl#L102
Splitting a toolchain across repos is still preferable to extra machinery around the POSIX and system includes to break the cyclic dependency since the solutions I've come up with all involve having to reinstall the toolchain whenever we add more POSIX support.
@deeglaze if it's only for include paths, cannot you use something like include_directories = [ctx.file.header.owner.package]
in your C++ toolchain? That way you might be able to decouple @com_google_asylo from knowing where include directories from @com_google_asylo_toolchain are. But I admit I don't think I understand exactly what is the problem.
The problem is that path are assumed to be relative to the package in which
cc_toolchain_config
target is, as you correctly pointed out. Another problem is that paths must be normalized (#5809) so you cannot walk up from the package to a different repository.Is my understanding correct that solving this last problem is what this issue is about?
@hlopko So, where do we go from here? :)
~I noticed that CcToolchainProviderHelper.resolveIncludeDir
has special support for path prefixes like %workspace%/
and %package(...)/
. Would it make sense for CppToolchainInfo
to support the same? I haven't tried it yet, but I think if CppToolchainInfo
handled %workspace%/
similarly, then "%workspace%/" + ctx.file.gcc.path
could do the trick.~
Update: I happened upon 473ea10c0dc76f3760f4f4afd3adb3b40b40abdb and realized it's a bit more involved than I thought.
Is there a particular reason why relative paths are used, or is it more of a side effect of the original CROSSTOOL configurations being in proto rather than starlark?
The behaviour seems a little odd and from my perspective doesn't feel consistent with the rest of bazel's starlark API.
I would be happy to contribute towards a fix. However I have limited understanding of bazel internals and rudimentary experience with java so it may be a long time before I get to it.
Any ideas on a general approach for a fix, that might minimise breaking changes and avoid introducing other issues?
Relative paths are just a side effect of the history.
I also think this is odd, but so far we didn't find enough motivation to fix it. So there are 2 use cases:
1) we need to be able to express absolute paths somehow for non hermetic toolchains (unless we decide we don't want to support non hermetic toolchains, which is too deep question for a github issue reply :)
2) we need to be able to express relative paths - for these using Artifacts (e.g. ctx.file.foo
) makes a lot of sense. But it will be confusing on another level. Now you have to pass your 'gcc' artifact to CcToolchainConfigInfo
rule for the command line, and also to 'cc_toolchain' for action inputs. And it's hard to unify those 2 uses - we need to be able to configure tool paths using features, and we definitely don't want to teach cc_toolchain
about all the intricacies of CcToolchainConfigInfo
.
If @silvergasp you want to contribute smth, then adding file
attribute to https://source.bazel.build/bazel/+/8bd7e4822ed96615ccac77cc40ca96eddc74c510:tools/cpp/cc_toolchain_config_lib.bzl;l=421 and using that in C++ rules might make sense. @scentini wdyt?
I have finally had a bit of time to look into this a little further. Mostly just trying to familiarise myself with the source code. I would just like to consolidate my current understanding before I start putting any significant amount of time into this.
ToolInfo
provider that you suggested I add a file
attribute to, is used for only for the ActionConfigInfo
provider (Can you confirm this?). cc_common.get_tool_for_action()
(here), and executed using ctx.actions.run()
(here). cc_common.get_tool_for_action()
to handle both file
and path
attributes, with some form of conflict resolution (If the user decides to define both). Would I be on the right track here or am I missing something significant?I admit I did look through the current java implementations for the cc rules and found it too difficult to follow in the time I have available for this.
Is the plan to support both native cc_rules rules and skylark cc_rules, or to move exclusively to skylark?
Are there any other updates on this so far? I would like to avoid duplicating effort :)
Cc @scentini since @hlopko is still out.
There is no ongoing effort on this just yet, so feel free to go ahead. Long term plan is to implement the native rules in Starlark, but that is O(years) :) Note that currently (close to) all of the cc rules are implemented in Java.
cc_common.get_tool_for_action()
is exposed to Starlark, but it is actually implemented in Java (here) and can be used in custom Starlark code, like the example you found, but is also extensively used by the native rules. So, you can't really touch only the Starlark part.
However, looks like you don't really have to modify that method. Instead you can modify the constructor for the ToolInfo
provider, which is here, to take into account a new, file
attribute, and to obtain the path from that attribute, if specified.I took a look into implementing this. I have it working locally, but I believe my implementation is incomplete. The part I'm unsure about is the interaction between the CcToolchainFeatures.Tool class and the CToolChain.Tool protobuf. I've added an Artifact member to CcToolchainFeatures.Tool, but I'm not sure how to serialize/deserialize that into and out of the protobuf. I can guess how you might get a string out of it to serialize, but when it comes to deserializing via the private constructor, I really have no idea. I'm assuming that the protobuf code is for sending the toolchain config across the wire for remote execution, but... I'm just guessing there. Obviously this is going to be incomplete without that bit.
Any suggestions on how to proceed? I'm happy to cook up a pull request so you can see what I've done, but obviously it wouldn't be a final product yet.
Is there any update on this?
It appears that rules_foreign_cc does not work with non-hermetic toolchains (and wrapper scripts provided to the tool_paths
required by cc_common.create_cc_toolchain_config_info
). See https://github.com/bazelbuild/bazel/issues/8438#issuecomment-810168376
New year, new check for updates ;) It would be great to be able to reference tools from external packages without wrapper scripts.
This has mostly been addressed in #10967.
I think there is a little bit more work to do before closing this;
I'd be happy to create a PR updating the docs/tutorials if that is something that would be considered useful?
An additional point would be that https://github.com/bazelbuild/rules_cc/pull/72 would probably need to be merged to match the main Bazel repository as well.
I am confused how to use this. My understanding is, now you can use tool(tool = ...)
instead of tool_path()
. I tried something like:
cc_common.create_cc_toolchain_config_info(
...
compiler = tool(tool = ...),
)
But when do this I get:
Error in create_cc_toolchain_config_info: in call to create_cc_toolchain_config_info(), parameter 'compiler' got value of type 'struct', want 'string'
You need to use the tool constructor with an action_config.
I answered a StackOverflow question earlier today on this, if you are looking for an example.
Personally I think the tool constructor could probably entirely replace the tool_path constructor. I think having both just gets confusing, with very little added value.
I agree. The architecture of toolchain config is unduly affected by the gnu/binutils way which in turn was set due to the Unix way having a single separate executable for every singular task. Other toolchains, like zig, which is a very good match for Bazel uses zig
The recently created design doc at https://docs.google.com/document/d/1-etGNsPneQ8W7MBMxtLloEq-Jj9ng1G-Pip-cWtTg_Y/edit?usp=sharing by the author of the issue should probably be mentioned here.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.
Between https://stackoverflow.com/questions/73504780/bazel-reference-binaries-from-packages-in-custom-toolchain-definition/73505313#73505313 and https://github.com/bazelbuild/rules_cc/blob/main/cc/toolchains/README.md, this appears to be mostly resolved (although still under-documented).
Description of the problem / feature request:
Currently the workflow to define a cc_toolchain is a good fit for a monorepo. However it is difficult to have a toolchain configuration span across multiple repos.
Why would I like to be able to have a toolchain span multiple repos? I would like to share a configurable toolchain as an open source project. Therefore I would like to be able to download and use a "CcToolchainConfigInfo" provider rule from an external repository (e.g. using http_archive).
Feature requests: what underlying problem are you trying to solve with this feature?
To my knowledge this is not currently possible to span a toolchain config across multiple repositories. This is due to the fact that
tool_path
from"@bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl"
is relative to the package directory. The problem with this is that when you load a toolchain config rule from an external repository the relative paths are relative to where the rule is instantiated rather than where the rule is defined.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Follow the tutorial on configuring c++ toolchains. Then attempt to load and use cc_toolchain_config from a different workspace. Doing so will result in an error where the given toolpath cannot be found.
What operating system are you running Bazel on?
Manjaro Linux Rolling Release
What's the output of
bazel info release
?release 0.23.0
If
bazel info release
returns "development version" or "(@non-git)", tell us how you built Bazel.N/A
What's the output of
git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?N/A
Have you found anything relevant by searching the web?
I have found that repository rules are a potential workaround. This works by passing through the attrs from the repository_rule through to a toolchain config rule. This bypasses the problem of relative paths by instantiating the config rule in the given repository. For our particular use case this is a fairly heavy weight solution as it involves creating a new repository for each of our targets. We currently have about 10 different configurations for different cc_toolchain rules, as we target a number of different embedded devices.
Any other information, logs, or outputs that you want to share?
I think that a potential solution would be to include a new rule that can reference a target rather than a relative path. e.g.
@toolchain_workspace//:BUILD
@toolchain_workspace//:cc_toolchain_config.bzl
This should support the following usage; //:BUILD