Open matzipan opened 9 months ago
Right now rules_python does not strictly support cross compilation and it seems that rules_pycross can use our toolchain for that.
I wonder about how your cross compilation toolchain achieves that? What other modifications do you have to make to get it to work?
I think there is maybe a confusion in terms. Just to make sure we are using the same: in my environment python only ever runs on one host/execution platform, which is my workstation. There is no cross aspect to it.
But in the build chain, I have compilation targets which are configured for other target platforms. Let's say a binary that is compiled for a different OS or architecture that I then manipulate with a python tool. Or a binary that I then run inside a QEMU target launched from python.
The only modification I needed to make this work was this one above.
Thanks for the explanation. I think your patch makes sense, would you like to submit a PR for it?
On 1 February 2024 08:02:41 GMT+09:00, Andrei @.***> wrote:
I think there is maybe a confusion in terms. Just to make sure we are using the same: in my environment python only ever runs on one host/execution platform, which is my workstation. There is no cross aspect to it.
But in the build chain, I have compilation targets which are configured for other target platforms. Let's say a binary that is compiled for a different OS or architecture that I then manipulate with a python tool. Or a binary that I then run inside a QEMU target launched from python.
The only modification I needed to make this work was this one above.
-- Reply to this email directly or view it on GitHub: https://github.com/bazelbuild/rules_python/issues/1739#issuecomment-1920138084 You are receiving this because you commented.
Message ID: @.***>
Hol up. TCW is the correct setting. I'll post again shortly to explain why.
See #704 and #747. We already switched from exec to target once.
Ah, thank you for the context. I forgot about that PR, but now I remember. And sorry for confusion here, everybody.
I guess then the solution to the problem would have to be different. I think the fact that we have the toolchain as one monolithic thing could be an issue here. We may want to look at how the toolchain and the runtime is split in the rules_jvm so that the right runtime is included in py_binary targets, but we could still run the interpreter that works on the host platform when targetting other platforms.
EDIT: I'll let Richard to explain it better than I can.
TCW is correct because the configuration of the files in the toolchain are using cfg=target, not cfg=exec. This means, when a rule looks up the toolchain, things like PyRuntimeInfo.files aren't guaranteed to be compatible with whatever execution platform was selected during toolchain resolution. In order to get an exec-compatible artifact, the files in the toolchain need to come from a cfg=exec source.
If TCW was changed to ECW, things would break quite badly, as targeting the mac platform may suddenly try to use a linux interpreter.
Also, remember that "target platform" means "the current configuration's target platform", which changes depending on if a target is in the target or exec configuration. i.e. a build tool's target platform is the platform selected by the exec config. Stated another way: if you want to run a Python program as a build action, creating a py_binary and putting in an attribute with cfg=exec will cause the py_binary's toolchain lookups to use whatever the exec platform is.
I think the fact that we have the toolchain as one monolithic thing could be an issue here.
Yes.
The short version is: if you want an exec-mode interpreter, then you have two options:
It seems likely people are fond of directly using the toolchain in actions? I don't really understand why instead of doing (1), but in any case, I'd be +1 on adding (2).
To understand the "why" of some of this:
With toolchains, the ECW part basically means, "This tool that must run on $ECW and can create output for $TCW" (if TCW is empty, it means "any platform"). When both TCW and ECW are set, it tightens the statement to "This tool must run on $ECW and can only generate output for $TCW".
For the interpreter itself, the latter statement doesn't make much sense -- the interpreter doesn't do anything on its own; it could generate output for any platform.
You might think: What if we just set both to the same value? Then running the exec-config interpreter produces the same values. This sort of works, but is still strictly incorrect (mixing config types), and also makes using multiple other toolchains more problematic and cross builds need a combinatorical explosion of toolchain definitions. Part of the way toolchain resolution works is a single (target, exec) tuple that is compatible with all the toolchains a rule needs is sought after. By setting e.g. (tcw=linux, ecw=linux)
, then only a linux host can use that toolchain and all other toolchains must also have a linux-host capable toolchain. This then creates a combinatorical explosion of toolchains for cross builds: to cross build target=linux, host=mac
, there needs to be a (tcw=linux, ecw=mac)
combination etc etc.
Thank you all for the in depth response. I understand now the philosophy of it.
It sort of makes sense. I was not aware of the cfg param. Will give it a shot and report back.
Thanks for all the nice work.
Hm, okay so cfg="exec"
is only accessible at rule definition time. Which means one would have to reimplement all of this just to change that flag.
Second option of having two toolchain definitions, perhaps configurable in some way, then becomes more interesting.
reimplement all of this just to change that flag.
I don't follow. I don't see any actions in that file? Is something doing one elsewhere?
If you want to run a rules_py py_binary as an action, then you just need to put cfg=exec on the attribute that is consuming the target. This isn't a special behavior of rules_python or rules_py; it's just how attributes for a rule are defined for any Bazel rule.
I know rules_py tries to setup a venv, but I don't know much about venv setup or how closely coupled venvs are to the target configuration (i.e. interpreter); I'm assuming fairly closely coupled, as that is rather common with Python tooling. What I can say is that you can't assume the exec and target configuration are the same, and there isn't really a way to force them to be.
Ideally, you want actions to be independent of the target config and take arguments that tell the action about the target config to generate the correct output. e.g. pass --max-int
as an argument to a tool instead of using sys.maxint
at runtime.
About the closest you can get to forcing them to be the same is to have a toolchain with the same TCW and ECW settings. That will force the two configs to be compatible, insofar as those constraints are concerned (but that it doesn't mean the two configs are the same, they could vary in other ways). There are also exec_groups, which can be used to force actions into predefined constraints, but that isn't the same as using the target config. Ah, I almost forget, also auto exec groups (which I haven't used, but docs indicate it may be of use for such a case).
If you want to run a rules_py py_binary as an action, then you just need to put cfg=exec on the attribute that is consuming the target
Above, I mentioned two examples:
py_binary
(and py_test
) are leaf targets I invoke py_test
which takes a binary image generated for a different platform as a deps
.What I can say is that you can't assume the exec and target configuration are the same, and there isn't really a way to force them to be.
We're all in agreement here. The current platform implementation is a simple concept, but this simplicity brings a lack of expressivity which we seem to be hitting in this case.
Yeah, chiming in here, a side-effect of this is demonstrated in this repo (distilled down from a larger internal setup): https://github.com/novas0x2a/rules-python-host (Below examples are from commit id https://github.com/novas0x2a/rules-python-host/commit/9fbc1fdd2f9447bfe8cf53a2acac43bd2be4e307 in case I modify the repo)
Context: I'm on my linux-amd64 laptop. I'm trying to target arm64:
$ bazel test ...
INFO: Analyzed 4 targets (0 packages loaded, 4041 targets configured).
INFO: Found 2 targets and 2 test targets...
INFO: Elapsed time: 2.897s, Critical Path: 2.61s
INFO: 15 processes: 11 internal, 2 linux-sandbox, 2 local.
INFO: Build completed successfully, 15 total actions
//:pip_test PASSED in 2.3s
//:test PASSED in 0.7s
Executed 2 out of 2 tests: 2 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
$ bazel test --platforms :arm64 ...
WARNING: Build option --platforms has changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed).
INFO: Analyzed 4 targets (0 packages loaded, 4033 targets configured).
FAIL: //:pip_test (see /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/pip_test/test.log)
INFO: From Testing //:pip_test:
==================== Test output for //:pip_test:
aarch64-binfmt-P: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory
================================================================================
FAIL: //:test (see /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/test.log)
INFO: From Testing //:test:
==================== Test output for //:test:
aarch64-binfmt-P: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory
================================================================================
INFO: Found 2 targets and 2 test targets...
INFO: Elapsed time: 0.646s, Critical Path: 0.45s
INFO: 15 processes: 11 internal, 2 linux-sandbox, 2 local.
INFO: Build completed, 2 tests FAILED, 15 total actions
//:pip_test FAILED in 0.1s
/home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/pip_test/test.log
//:test FAILED in 0.0s
/home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/test.log
Executed 2 out of 2 tests: 2 fail locally.
Due to the lack of an exec_compatible_with, it selects a python toolchain that's incompatible with the current exec platform.
This is working as intended, you need to have a test executor that supports arm64 when executing the bazel test command, which may require an RBE setup.
If you still want to build arm64 binaries, but test it on an amd64 machine, you would need to use transitions to transition your binaries to arm64 configuration and test on amd64.
On 8 August 2024 02:16:56 EEST, Mike Lundy @.***> wrote:
Yeah, chiming in here, a side-effect of this is demonstrated in this repo (distilled down from a larger internal setup): https://github.com/novas0x2a/rules-python-host (Below examples are from commit id https://github.com/novas0x2a/rules-python-host/commit/9fbc1fdd2f9447bfe8cf53a2acac43bd2be4e307 in case I modify the repo)
Context: I'm on my linux-amd64 laptop. I'm trying to target arm64:
$ bazel test ... INFO: Analyzed 4 targets (0 packages loaded, 4041 targets configured). INFO: Found 2 targets and 2 test targets... INFO: Elapsed time: 2.897s, Critical Path: 2.61s INFO: 15 processes: 11 internal, 2 linux-sandbox, 2 local. INFO: Build completed successfully, 15 total actions //:pip_test PASSED in 2.3s //:test PASSED in 0.7s Executed 2 out of 2 tests: 2 tests pass. There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are. ***@***.***:~/Code/novas0x2a/rules-python-host> bazel test --platforms :arm64 ... WARNING: Build option --platforms has changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed). INFO: Analyzed 4 targets (0 packages loaded, 4033 targets configured). FAIL: //:pip_test (see /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/pip_test/test.log) INFO: From Testing //:pip_test: ==================== Test output for //:pip_test: aarch64-binfmt-P: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory ================================================================================ FAIL: //:test (see /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/test.log) INFO: From Testing //:test: ==================== Test output for //:test: aarch64-binfmt-P: Could not open '/lib/ld-linux-aarch64.so.1': No such file or directory ================================================================================ INFO: Found 2 targets and 2 test targets... INFO: Elapsed time: 0.646s, Critical Path: 0.45s INFO: 15 processes: 11 internal, 2 linux-sandbox, 2 local. INFO: Build completed, 2 tests FAILED, 15 total actions //:pip_test FAILED in 0.1s /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/pip_test/test.log //:test FAILED in 0.0s /home/mlundy/.cache/bazel/_bazel_mlundy/49f7ad76792f58b5e5139cb5fb9829d8/execroot/__main__/bazel-out/k8-fastbuild/testlogs/test/test.log Executed 2 out of 2 tests: 2 fail locally.
Due to the lack of an exec_compatible_with, it selects a python toolchain that's incompatible with the current exec platform.
-- Reply to this email directly or view it on GitHub: https://github.com/bazelbuild/rules_python/issues/1739#issuecomment-2274502869 You are receiving this because you commented.
Message ID: @.***>
This is WAI
Yeah. It looks like it's building an arm64 binary, as requested by the --platforms=arm64
flag, but then running on the local host, which is x86.
If you still want to build arm64 binaries, but test it on an amd64 machine, you would need to use transitions to transition your binaries to arm64 configuration and test on amd64.
That probably would be the easier way, yes. RBE would be the other way.
You could try to invert the relationship (e.g. a cfg=exec (x86) action runs an emulator tool that takes a cfg=target binary (the actual test, in arm64, set by --platforms
) to run under emulation, then figure out how to handle the output of running the emulator).
But note that this doesn't have much to do with the toolchain TCW/ECW settings. I think it might matter if a custom rule is using multiple toolchains whose TCW/ECW settings don't get along (or deceitful things like the autodetecting toolchain), but the issue there is in how the custom rule is consuming toolchains, not in a toolchain's TCW/ECW settings.
Note that this is being triggered by py_test on arch-independent code; if it selected the correct toolchain it should work.
I think it could be possible to tell bazel that your py_test
should be always transition to the host platform and @platforms//host
might help here. In general py_test
is not necessary arch-independent and it sometimes depends on deps
that are arch dependent, so although in your case things would/could work, in the generic case if you wanted to build a docker container of a py_test
target so that you can run the test inside docker
as part of your bazel test
(i.e. sh_binary
calling the dockerized py_test
), then you would be in trouble.
I think it could be possible to tell bazel that your py_test should be always transition to the host platform and @platforms//host might help here. In general py_test is not necessary arch-independent and it sometimes depends on deps that are arch dependent
Yeah, in my above example I oversimplified, but in the original thing it's derived from, we do have both types of test. Currently I have to mark all of the py_tests target_compatible_with = HOST_CONSTRAINTS
but that basically makes cross-compiling useless. Ideally:
"arch independent" is a bit of a hard concept for Bazel to model. About the closest you can do is @platforms//os:any
(or is it "all", or "cpu:any"? I can't remember exactly). Note that, when using the hermetic runtimes, py_test isn't arch independent because it includes a prebuilt runtime for a particular platform.
For an example of writing a rule that performs a transition, look at python/config_settings/transition.bzl
. It's fairly simple: perform a transition, then symlink some of the outputs of the wrapped target.
Note that, when using the hermetic runtimes, py_test isn't arch independent because it includes a prebuilt runtime for a particular platform
Sorry, I was unclear, I meant that all of the explicit inputs were arch-independent, the only arch-dependent bits are the toolchain / runtime. (I understand that this isn't true all the time, but it's true about half the time for us).
For an example of writing a rule that performs a transition, look at python/config_settings/transition.bzl. It's fairly simple: perform a transition, then symlink some of the outputs of the wrapped target.
Sorry to be a pest, but, I don't understand how to do this in practice. The transition isn't the hard part (I've written transitions before, also, aspects's rules for binaries and filegroups work for those providers; it's just painful/fragile that one needs to write a new transition rule for every type of provider combination one might want to pass on...)
The issue is the rest of it. What do I transition, what might that look like when applied to my toy repo?, fixing both the py_test and the compile_pip_requirements? That's the part I don't understand.
Does this post resonate with you @novas0x2a https://github.com/bazelbuild/bazel/issues/19645
One idea how to achieve a py_host_test
would be to:
import native_test from skylib
import transition_filegroup from aspect
def py_host_test(name, **kwargs):
py_binary(name + ".input", testonly=True, **kwargs)
asects_transition_rule(name + ".transitioned", src = name + ".input", target_platform = "@platforms//host")
native_test(name, src = name + ".transitioned")
However, maybe it would be better to lift the example from rules_platform and not use --platforms
in bazelrc:
cc_binary(name = "foo")
platform_data(
name = "foo_embedded",
target = ":foo",
platform = "//my/new:platform",
)
py_test(
name = "foo_test",
srcs = ...,
data = [
":foo_embedded",
],
)```
(sorry, was away from this for a few days) @groodt yeah, it does. I can see why rules_python chose to do it this way, even if it's personally annoying for me that it causes bazel to choose a toolchain that doesn't work on my exec platform :) It doesn't seem like there's a Correct Answer right now.
@aignas thanks a ton, I get it now. For those who stumble upon this later, here's the diff I made to my sample project that made it work. Be careful about just copying it, I didn't ensure that I was handling all possible args correctly (just enough to prove it worked), so there are probably some that would get passed to the wrong rule.
Re: platform_data vs platform_transition_binary, I think they both use the same //command_line_option:platforms
-based transition (platform_data / aspect-lib)
Doesn't it just need to register both target and exec constraints to the same thing (unlike before this change which only had exec constraints)? I'm running into an issue where an exec
tool is incorrectly bundling a target interpreter in a cross-platform build.
Edit: Doing that didn't fix my issue, something else if preventing the exec platform from being correct for me. Can ignore for now.
🐞 bug report
Affected Rule
python_toolchain_build_file_content
Is this a regression?
Unsure
Description
When it configures
py_toolchain_suite
, it usestarget_compatible_with
to configure the platform compatibilties.It should instead use
exec_compatible_with
.Yes, on most systems where
rules_python
runs this is one and the same. But if I want to userules_python
in my cross compilation toolchain, I can't do it.target_compatible_with
would say it's incompatible. See here.The following change allows me to do that:
🔬 Minimal Reproduction
Define a platform which doesn't match your host platform and try to use it, as if for cross-compilation.
🔥 Exception or Error
No matching toolchain.
🌍 Your Environment
Operating System:
Linux, x86_64
Output of
bazel version
:6.3.2
Rules_python version:
0.29.0
Anything else relevant?