envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.93k stars 4.8k forks source link

ppc64le CI build always fails #25090

Open phlax opened 1 year ago

phlax commented 1 year ago

Currently we have a build triggered here https://powerci.osuosl.org/job/build-envoy-static-master/

it always fails with:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading rc options for 'build' from /home/alfred/jenkins/workspace/build-envoy-static-master/.bazelrc:
  'build' options: --color=yes --workspace_status_command=bash bazel/get_workspace_status --incompatible_strict_action_env --host_force_python=PY3 --java_runtime_version=remotejdk_11 --tool_java_runtime_version=remotejdk_11 --platform_mappings=bazel/platform_mappings --copt=-DABSL_MIN_LOG_LEVEL=4 --action_env=CC --host_action_env=CC --action_env=CXX --host_action_env=CXX --action_env=LLVM_CONFIG --host_action_env=LLVM_CONFIG --action_env=PATH --host_action_env=PATH --enable_platform_specific_config --test_summary=terse --incompatible_config_setting_private_default_visibility --incompatible_enforce_config_setting_visibility --define absl=1 --@com_googlesource_googleurl//build_config:system_icu=0 --test_env=HEAPCHECK=normal --test_env=PPROF_PATH
ERROR: --host_action_env=CC :: Unrecognized option: --host_action_env=CC
Build step 'Execute shell' marked build as failure
Stopping Docker container after build completion
Finished: FAILURE

im guessing a bazel version or similar, but not sure

apart from wasted cycles it means we always have a failing status badge - we should either fix or remove the badge if its not expected to pass and noone intends to fix

phlax commented 1 year ago

cc @cmluciano @clnperez

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

phlax commented 1 year ago

as noone seems to be following up on this i will pr to remove the badge - and potentially disable the ci

cmluciano commented 1 year ago

Hi @phlax I no longer at the company where I supported the ppc64le builds. I will try to ping the individuals that I handed this work over to but given that it's been failing for a while, it's probably safe to remove.

phlax commented 1 year ago

thanks for update @cmluciano i will hold off for a little while in that case and i would be happy to assist if i can getting it working again - but yep, it probs makes sense to remove until it can be fixed

clnperez commented 1 year ago

thank you for this! my mailbox has become very overloaded but @cmluciano found me and let me know. i'd definitely like to get this fixed so that these builds remain. i had submitted some bazel fixes years ago to get it working to begin with, so hopefully not too big of a deal to fix.

phlax commented 1 year ago

if my reading is correct (may not be!) its using a fixed bazel version that is now not working

if that is the case my recommendation would be to switch to using bazelisk with bazel aliased to it so it respects the repo's bazel version

clnperez commented 1 year ago

if my reading is correct (may not be!) its using a fixed bazel version that is now not working

your reading of what I said? :D or something else?

phlax commented 1 year ago

the fail logs

clnperez commented 1 year ago

@phlax -- retooling the container this build uses, and will see if i can get bazelisk into this workflow. but first would like to double-check the Java & C++ config. We're using clang, and i do see mention of a setup_clang.sh script in the envoy doc. However, I can't find that script. Is there any other doc wrt the java version required, how to configure clang, etc?

phlax commented 1 year ago

so for clang the reference (atm at least) is the build container which serves as an environment to bazel in envoy's CI

probably most relevant bit is here

https://github.com/envoyproxy/envoy-build-tools/blob/b0452fc4dfb1b02357bd4ce2d55a1056b53e2ffd/build_container/build_container_ubuntu.sh#L90-L107

phlax commented 1 year ago

you shouldnt need to worry about java versions or most other things, but if you see what is installed in that container that will cover anything you will need

phlax commented 1 year ago

setup_clang.sh is in the ci/ dir

clnperez commented 1 year ago

i updated the bazel version to see if that got rid of the error that was being seen, but it looks like maybe now i'll need to go look at the rules_python repo. https://powerci.osuosl.org/job/build-envoy-static-master/8292/console

it doesn't seem to support ppc64le at the moment: ERROR: Error computing the main repository mapping: no such package '@python3_10//': No platform declared for host OS linux on arch ppc64le

but maybe there's a way to use a native python instead

phlax commented 1 year ago

they come from here https://github.com/indygreg/python-build-standalone/releases/tag/20230116

it doesnt look very hopeful

phlax commented 1 year ago

kinda related https://github.com/bazelbuild/rules_python/issues/877

i think you can work around this (theoretically) by using a custom py_runtime but you may need the patch mentioned in that issue

clnperez commented 1 year ago

Ah, thanks for that.

Maaaaybe not hopeless! I submitted https://github.com/indygreg/python-build-standalone/pull/165 and I'll try to use that here. They do have s390x support, and they cross-compile everything. Hoping it's simple and that I won't need to chase down too many of these projects. I think I'll have to add a new triple in the rules_python repo at the very least.

ninehills commented 1 year ago

Today I manually built Python for ppc64le and modified the rules_python package. Currently, python dependency can be normally downloaded after manual patching (for envoy 1.23.3, used by istio 1.15.4, use bazel 5.2

Other envoy version need to patch specific rules_python version.

diff --git a/bazel/repositories_extra.bzl b/bazel/repositories_extra.bzl
index 885b41dec6..5649dc9e36 100644
--- a/bazel/repositories_extra.bzl
+++ b/bazel/repositories_extra.bzl
@@ -5,7 +5,7 @@ load("//bazel/external/cargo:crates.bzl", "raze_fetch_remote_crates")
 load("@aspect_bazel_lib//lib:repositories.bzl", "aspect_bazel_lib_dependencies")

 # Python version for `rules_python`
-PYTHON_VERSION = "3.10.2"
+PYTHON_VERSION = "3.10.9"

 # Envoy deps that rely on a first stage of dependency loading in envoy_dependencies().
 def envoy_dependencies_extra(python_version = PYTHON_VERSION):
diff --git a/bazel/repository_locations.bzl b/bazel/repository_locations.bzl
index 0992ec2559..2d5c3a8913 100644
--- a/bazel/repository_locations.bzl
+++ b/bazel/repository_locations.bzl
@@ -794,11 +794,11 @@ REPOSITORY_LOCATIONS_SPEC = dict(
         project_name = "Python rules for Bazel",
         project_desc = "Bazel rules for the Python language",
         project_url = "https://github.com/bazelbuild/rules_python",
-        version = "0.9.0",
-        sha256 = "5fa3c738d33acca3b97622a13a741129f67ef43f5fdfcec63b29374cc0574c29",
-        release_date = "2022-06-12",
+        version = "0.9.0-ppc64le-dirty-fix",
+        sha256 = "7127f9aaad346950a63e346b53a29d562ac56805d74e05dcd9ded3cf54591fcc",
+        release_date = "2023-03-08",
         strip_prefix = "rules_python-{version}",
-        urls = ["https://github.com/bazelbuild/rules_python/archive/{version}.tar.gz"],
+        urls = ["https://github.com/ninehills/rules_python/archive/{version}.tar.gz"],
         use_category = ["build"],
     ),
     rules_pkg = dict(
clnperez commented 1 year ago

Still waiting to hear back on the python PR

scheruku-in commented 1 year ago

Hi , I am facing the same error while trying to build envoy for ppc64. Would some one please let me know if there is any work around/fix for the same and how do I get a working envoy binary for ppc64? I have actually followed the documentation mentioned at https://github.com/envoyproxy/envoy/blob/main/bazel/README.md#quick-start-bazel-build-for-developers but stuck with the error.

ERROR: An error occurred during the fetch of repository 'python3_10':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/9cb33e04803417c878f91275e3e1bfb1/external/rules_python/python/private/toolchains_repo.bzl", line 88, column 38, in _toolchain_aliases_impl
                host_platform = get_host_platform(os_name, arch)
        File "/root/.cache/bazel/_bazel_root/9cb33e04803417c878f91275e3e1bfb1/external/rules_python/python/private/toolchains_repo.bzl", line 249, column 13, in get_host_platform
                fail("No platform declared for host OS {} on arch {}".format(os_name, arch))
Error in fail: No platform declared for host OS linux on arch ppc64le
ERROR: /root/sobha/envoy-proxy/WORKSPACE:17:25: fetching toolchain_aliases rule //external:python3_10: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/9cb33e04803417c878f91275e3e1bfb1/external/rules_python/python/private/toolchains_repo.bzl", line 88, column 38, in _toolchain_aliases_impl
                host_platform = get_host_platform(os_name, arch)
        File "/root/.cache/bazel/_bazel_root/9cb33e04803417c878f91275e3e1bfb1/external/rules_python/python/private/toolchains_repo.bzl", line 249, column 13, in get_host_platform
                fail("No platform declared for host OS {} on arch {}".format(os_name, arch))
Error in fail: No platform declared for host OS linux on arch ppc64le
ERROR: Error computing the main repository mapping: no such package '@python3_10//': No platform declared for host OS linux on arch ppc64le
clnperez commented 1 year ago

@scheruku-in The issue is that there just isn't a python build for power yet -- so you have to build your own. You can build the python requirements using the PR linked above, and then you'll need to do what @ninehills posted in the comment above to pull in what you build. When the PR is merged envoy can pick up the power builds like it does for all the other architectures.

clnperez commented 1 year ago

There was a new build released and I started putting together a PR to rules_python today, but, it looks like there's a problem with that release so I'll have to wait for a new one. Thanks for everyone's patience while we get this back!

clnperez commented 1 year ago

making more progress. the version for rules_python now has the one that picks up the power PR. but of course there are a couple of other things that also need to change. i did update the CI job, so you can see the newest failure. we do have a patch that fixes the build failure and will submit asap

clnperez commented 1 year ago

i also tested the patch via the jenkins ci we have attached ot this project and it passes https://powerci.osuosl.org/job/build-envoy-static-master/9594/console

@sumitd2 -- have you submitted that patch yet?

clnperez commented 1 year ago

https://github.com/envoyproxy/envoy/pull/28363 submitted as the final part, but it was requested that instead of patching boringssl, that we upstream the changes. waiting on that now

clnperez commented 1 year ago

Also now there's a new cpu platform type so we'll need to change from ppc everywhere to ppc64le. When I initially removed the luajit build 5 years ago, I used pcc everywhere. IIRC no one was doing much with big endian power and open source.

phlax commented 2 months ago

cc @clnperez the webhook for this ci stopped working yesterday, wondering if we should remove it

clnperez commented 4 days ago

@phlax sorry missed your comment. the CI was paused a while ago since the builds aren't fixed yet. We've made (I really hope) some good progress in our private endeavors there, but if you want to remove it for now, that's fine. Can you just link to that PR here?