Open alexeagle opened 1 year ago
Studied this with @gregmagolan
Let's look at the output of bazel aquery //some:build_smoke_test --config=rbe --config=aarch64
where those config flags enable the cross-platform RBE behavior:
action 'Expanding template some/build_smoke_test.sh'
Mnemonic: TemplateExpand
Configuration: k8-fastbuild-aarch64
Execution platform: //tools/platforms:linux_x86_jetpack5
...
Substitutions: [
{{{node}}: my_workspace/../nodejs_linux_amd64/bin/nodejs/bin/node}
...
runfiles for //some:build_smoke_test
Mnemonic: Middleman
Target: //some/aerial/frontend:build_smoke_test
Configuration: k8-fastbuild-aarch64
Execution platform: //tools/platforms:linux_x86_jetpack5
ActionKey: 709e80c88487a2411e1ee4dfb9f22a861492d20c4765150c0c794abd70f8147c
Inputs: [..., external/nodejs_linux_amd64/bin/nodejs/bin/node]
action 'Testing //some:build_smoke_test'
Mnemonic: TestRunner
Target: //some:build_smoke_test
Configuration: k8-fastbuild-aarch64
Execution platform: //tools/platforms:linux_aarch64_jetpack5
Command Line: (exec external/bazel_tools/tools/test/test-setup.sh \
What we see here is that even if we fixed the {{node}}
template variable we put in the launcher, we would still have the wrong nodejs executable in the runfiles for the test, because the "middleman" action which generates the runfiles has an x86 exec platform. That makes this seem like a Bazel limitation with cross-platform RBE.
I think it's just a general problem that cross-platform RBE doesn't work with platform-specific inputs that come from runfiles.
I don't think the execution platform for the middleman action matters, copying runfiles into the final location is a completely platform independent action. At first glance this looks like https://bazelbuild.slack.com/archives/CA31HN1T3/p1690184400360329?thread_ts=1690176577.746239&cid=CA31HN1T3: You may need to define an additional toolchain that matches on the target platform, not the exec platform.
How is the target platform relevant here? This is a script used in a build action.
It's a script that references a binary obtained from the toolchain, both by substituting in its path and adding its files to runfiles. But as far as I can tell, there are always two Node toolchains of the same type, one with a target constraint and one with an exec constraint: https://github.com/bazelbuild/rules_nodejs/blob/cc742d3b02c95eb56fce241c8fff6605d9e9c315/nodejs/private/toolchains_repo.bzl#L105-L116
This can cause this problem if the exec platform is linux_arm64 and a js_binary
is built in the exec configuration (that is, for linux_arm64
), as then the linux_amd64 toolchain with the exec constraint for linux_amd64 can end up being selected.
This could be solved by having a second, distinct toolchain type for Node runtimes with target constraints, similar to what the native Java toolchains do.
What happened?
When RBE is used for cross-compilation, the host platform may be different from the exec platform.
In my case I have a linux_x86 host platform, so the launcher created by
ctx.actions.expand_template
here https://github.com/aspect-build/rules_js/blob/92a36f314b7841475e12a68dcd018c088f373bc2/js/private/js_binary.bzl#L481-L486 will create a file with a node path pointing to the host-resolved toolchain, with linux_x86 arch.Now, I enable RBE and the
exec
platform is linux_arm64. The launcher script is copied to the remote and tries to spawn node for the wrong arch, which of course fails with executable format error
cannot execute binary file ...nodejs_linux_amd64...`Version
Bazel 6.2.1, latest of rules_js
How to reproduce
Any other information?
No response