conda-forge / ray-packages-feedstock

A conda-smithy repository for ray-packages.
BSD 3-Clause "New" or "Revised" License
10 stars 23 forks source link

Unvendor grpc/protobuf #90

Open h-vetinari opened 1 year ago

h-vetinari commented 1 year ago

I didn't realize that ray is building grpc as a vendored project. This would be a pretty obviously candidate for effing something up when using a newer grpcio.

The biggest problem with this is how hard it is (for me at least) to tell bazel to use "foreign" libraries.

Originally posted by @h-vetinari in https://github.com/conda-forge/ray-packages-feedstock/issues/87#issuecomment-1376923344

ngam commented 1 year ago

Noticed the following the logs (on main, passing builds, https://github.com/conda-forge/ray-packages-feedstock/runs/10863439621). Pretty interesting stuff!

2023-01-24T21:20:07.3483964Z INFO: Analyzed 2 targets (153 packages loaded, 21326 targets configured).
2023-01-24T21:20:07.3519661Z INFO: Found 2 targets...
2023-01-24T21:20:07.4415336Z [0 / 9] [Prepa] BazelWorkspaceStatusAction stable-status.txt
2023-01-24T21:20:16.3821713Z [14 / 1,961] Compiling src/google/protobuf/compiler/cpp/cpp_field.cc; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:20:27.6915280Z [21 / 1,961] Compiling src/google/protobuf/compiler/cpp/cpp_message.cc; 6s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:20:37.1167440Z INFO: From Compiling src/google/protobuf/message_lite.cc:
2023-01-24T21:20:37.1191266Z In file included from /home/conda/feedstock_root/build_artifacts/ray-packages_1674594838659/_build_env/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/string.h:638,
2023-01-24T21:20:37.1192104Z                  from external/com_google_protobuf/src/google/protobuf/stubs/port.h:39,
2023-01-24T21:20:37.1192619Z                  from external/com_google_protobuf/src/google/protobuf/stubs/common.h:48,
2023-01-24T21:20:37.1193112Z                  from external/com_google_protobuf/src/google/protobuf/message_lite.h:45,
2023-01-24T21:20:37.1193612Z                  from external/com_google_protobuf/src/google/protobuf/message_lite.cc:36:
2023-01-24T21:20:37.1194141Z In function 'void* memcpy(void*, const void*, size_t)',
2023-01-24T21:20:37.1194953Z     inlined from 'uint8_t* google::protobuf::io::EpsCopyOutputStream::WriteRaw(const void*, int, uint8_t*)' at external/com_google_protobuf/src/google/protobuf/io/coded_stream.h:706:16,
2023-01-24T21:20:37.1196208Z     inlined from 'virtual uint8_t* google::protobuf::internal::ImplicitWeakMessage::_InternalSerialize(uint8_t*, google::protobuf::io::EpsCopyOutputStream*) const' at external/com_google_protobuf/src/google/protobuf/implicit_weak_message.h:84:28,
2023-01-24T21:20:37.1197520Z     inlined from 'bool google::protobuf::MessageLite::SerializePartialToZeroCopyStream(google::protobuf::io::ZeroCopyOutputStream*) const' at external/com_google_protobuf/src/google/protobuf/message_lite.cc:412:30:
2023-01-24T21:20:37.1199102Z /home/conda/feedstock_root/build_artifacts/ray-packages_1674594838659/_build_env/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/bits/string3.h:51:33: warning: 'void* __builtin___memcpy_chk(void*, const void*, long unsigned int, long unsigned int)' specified size between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
2023-01-24T21:20:37.1200145Z    51 |   return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
2023-01-24T21:20:37.1200619Z       |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-01-24T21:20:40.0947346Z [37 / 1,961] Compiling src/google/protobuf/compiler/csharp/csharp_reflection_class.cc; 0s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:20:53.8654923Z [55 / 1,961] Compiling src/google/protobuf/compiler/java/java_map_field.cc; 0s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:21:09.8994672Z [74 / 1,961] Compiling src/google/protobuf/compiler/plugin.pb.cc; 0s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:21:27.9441159Z [95 / 1,961] Compiling src/google/protobuf/io/printer.cc; 0s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:21:48.7495496Z [111 / 1,961] Compiling src/google/protobuf/compiler/objectivec/objectivec_primitive_field.cc; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:22:13.6192243Z [138 / 1,961] Compiling src/google/protobuf/extension_set.cc; 2s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:22:41.9635614Z [186 / 2,150] Compiling src/compiler/node_generator.cc [for host]; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:23:14.0442354Z [231 / 2,150] Compiling src/google/protobuf/generated_message_tctable_lite.cc [for host]; 1s processwrapper-sandbox ... (2 actions running)
2023-01-24T21:23:50.5721139Z [268 / 2,150] Compiling src/google/protobuf/any.pb.cc [for host]; 0s processwrapper-sandbox ... (2 actions running)
2023-01-24T21:24:32.5789497Z [310 / 2,150] Compiling src/google/protobuf/compiler/java/java_primitive_field.cc [for host]; 1s processwrapper-sandbox ... (2 actions running)
2023-01-24T21:24:58.1737035Z INFO: From Compiling src/google/protobuf/message_lite.cc [for host]:
2023-01-24T21:24:58.1780300Z In file included from /home/conda/feedstock_root/build_artifacts/ray-packages_1674594838659/_build_env/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/string.h:638,
2023-01-24T21:24:58.1781514Z                  from external/com_google_protobuf/src/google/protobuf/stubs/port.h:39,
2023-01-24T21:24:58.1782102Z                  from external/com_google_protobuf/src/google/protobuf/stubs/common.h:48,
2023-01-24T21:24:58.1791404Z                  from external/com_google_protobuf/src/google/protobuf/message_lite.h:45,
2023-01-24T21:24:58.1792308Z                  from external/com_google_protobuf/src/google/protobuf/message_lite.cc:36:
2023-01-24T21:24:58.1793098Z In function 'void* memcpy(void*, const void*, size_t)',
2023-01-24T21:24:58.1794152Z     inlined from 'uint8_t* google::protobuf::io::EpsCopyOutputStream::WriteRaw(const void*, int, uint8_t*)' at external/com_google_protobuf/src/google/protobuf/io/coded_stream.h:706:16,
2023-01-24T21:24:58.1795623Z     inlined from 'virtual uint8_t* google::protobuf::internal::ImplicitWeakMessage::_InternalSerialize(uint8_t*, google::protobuf::io::EpsCopyOutputStream*) const' at external/com_google_protobuf/src/google/protobuf/implicit_weak_message.h:84:28,
2023-01-24T21:24:58.1797153Z     inlined from 'bool google::protobuf::MessageLite::SerializePartialToZeroCopyStream(google::protobuf::io::ZeroCopyOutputStream*) const' at external/com_google_protobuf/src/google/protobuf/message_lite.cc:412:30:
2023-01-24T21:24:58.1798928Z /home/conda/feedstock_root/build_artifacts/ray-packages_1674594838659/_build_env/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/bits/string3.h:51:33: warning: 'void* __builtin___memcpy_chk(void*, const void*, long unsigned int, long unsigned int)' specified size between 18446744071562067968 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
2023-01-24T21:24:58.1800554Z    51 |   return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
2023-01-24T21:24:58.1801107Z       |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-01-24T21:25:20.9009284Z [416 / 2,281] Compiling src/compiler/csharp_generator.cc [for host]; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:25:22.1662427Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/reflection/v1alpha/reflection.grpc.pb.h:
2023-01-24T21:25:22.1669560Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:25:51.6341538Z INFO: From Generating Descriptor Set proto_library @com_github_cncf_udpa//xds/type/v3:pkg:
2023-01-24T21:25:51.6347480Z xds/type/v3/typed_struct.proto:10:1: warning: Import validate/validate.proto is unused.
2023-01-24T21:25:52.6425750Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/channelz/channelz.grpc.pb.h:
2023-01-24T21:25:52.6442802Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:25:52.7080488Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/testing/xds/v3/percent.grpc.pb.h:
2023-01-24T21:25:52.7082213Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:25:52.7434268Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/testing/xds/v3/base.grpc.pb.h:
2023-01-24T21:25:52.7444334Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:25:52.7844053Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/testing/xds/v3/config_dump.grpc.pb.h:
2023-01-24T21:25:52.7858832Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:25:52.8146701Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/testing/xds/v3/csds.grpc.pb.h:
2023-01-24T21:25:52.8195413Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:26:17.3034890Z [719 / 2,496] Compiling src/idl_gen_rust.cpp [for host]; 0s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:27:24.9172630Z [1,843 / 3,534] Compiling python/ray/_raylet.cpp; 52s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:28:39.2553777Z [1,954 / 3,534] Compiling src/google/protobuf/wire_format.cc; 3s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:30:05.1828782Z [2,045 / 3,534] Compiling src/ray/common/bundle_spec.cc; 8s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:31:43.7234689Z [2,136 / 3,534] Compiling src/cpp/server/server_cc.cc; 4s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:33:36.9333443Z [2,285 / 3,534] Compiling src/ray/raylet/node_manager.cc; 13s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:35:49.1076702Z [2,396 / 3,534] Compiling src/ray/core_worker/core_worker_process.cc; 9s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:38:19.0661531Z [2,533 / 3,534] Compiling src/ray/raylet/scheduling/policy/bundle_scheduling_policy.cc; 12s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:41:11.3956382Z [2,723 / 3,534] Compiling src/ray/raylet/agent_manager.cc; 16s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:44:29.9164559Z [2,954 / 3,534] Compiling src/core/ext/filters/client_channel/lb_policy/grpclb/grpclb.cc; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:48:19.1225621Z [3,208 / 3,534] Compiling src/ray/core_worker/transport/direct_task_transport.cc; 15s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:52:42.6329482Z [3,508 / 3,534] Compiling src/core/lib/iomgr/tcp_posix.cc; 1s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T21:52:54.5383055Z INFO: From Action external/com_github_grpc_grpc/src/proto/grpc/health/v1/health.grpc.pb.h:
2023-01-24T21:52:54.5384340Z bazel-out/k8-opt/bin/external/com_github_grpc_grpc/external/com_github_grpc_grpc: warning: directory does not exist.
2023-01-24T21:57:45.8430389Z [3,778 / 3,792] Compiling src/ray/gcs/gcs_server/gcs_actor_manager.cc; 5s processwrapper-sandbox ... (2 actions, 1 running)
2023-01-24T22:03:36.1203303Z [4,049 / 4,054] [Prepa] Linking cpp/libray_api.lo
2023-01-24T22:03:39.2502551Z INFO: Elapsed time: 2670.604s, Critical Path: 223.47s
mattip commented 1 year ago

Ahh, hang on, that is the warning that is failing the aarch64 builds in #92. So it was there all the time and the difference is a -Werror or so?

mattip commented 1 year ago

Do we have any bazel experts around who could either remove the build altogether or figure out how to ignore that error?

ngam commented 1 year ago

Do we have any bazel experts around who could either remove the build altogether or figure out how to ignore that error?

You mean in conda-forge or upstream? We will try to adapt this build to make it work (with bazel). We have a specific toolchain that we likely have to use https://github.com/conda-forge/bazel-toolchain-feedstock (an example of using this toolchain successfully is jaxlib, and the tensorflow build relies on a similarly modified toolchain)

ngam commented 1 year ago

Do you have specific needs, @mattip? I am planning to attempt fixing this in the coming weeks, but I can also try make some effort sooner

ngam commented 1 year ago

Our bazel expert is @xhochy who may not be free these days (we miss you if you see this!)

mattip commented 1 year ago

It seems tensorflow has a whole scheme to allow using system libraries. Is this build deps the parallel in ray? How would that look for a local grpc?

ngam commented 1 year ago

We use this sort of thing in jaxlib: https://github.com/conda-forge/jaxlib-feedstock/blob/77c8ef863a48afae4654c4adc5962232f807cf8e/recipe/build.sh#L70

We also tend to edit bazelrc files like this: https://github.com/conda-forge/jaxlib-feedstock/blob/77c8ef863a48afae4654c4adc5962232f807cf8e/recipe/build.sh#L14-L26

Another good example is the tensorflow build: https://github.com/conda-forge/tensorflow-feedstock/blob/main/recipe/build.sh

ngam commented 1 year ago

The main issue for me is whether or not we will have to do a lot of deep patching to get this to work. I am not that familiar with the build setup of ray yet

mattip commented 1 year ago

We use this sort of thing in jaxlib

That passes TF_SYSTEM_LIBS down to tensorflow, which has a whole scheme to allow using system libraries. This mechanism does not exist so far in ray.

cread commented 1 year ago

Is anyone still working on this? Having such an old version pinned here is starting to cause some problems for us.

ngam commented 1 year ago

Is anyone still working on this? Having such an old version pinned here is starting to cause some problems for us.

Not that I'm aware of. Please feel free to have a go and tag people in this issue so that we can keep track and help if we could

mattip commented 1 year ago

Ray 2.4.0 pins to <1.49 like upstream ray on darwin. Would changing to exactly the upstream pinning (<1.51.3 on non-darwin) help your use-case?

cread commented 1 year ago

Ray 2.4.0 pins to <1.49 like upstream ray on darwin. Would changing to exactly the upstream pinning (<1.51.3 on non-darwin) help your use-case?

Yes, this would help a lot actually.

h-vetinari commented 1 year ago

Good news: dealing with external deps in bazel might finallyyyyyyy be getting easier: https://github.com/conda-forge/tensorflow-feedstock/issues/332

mattip commented 1 year ago

It requires bazel 6, which does not seem to work. See ray-project/ray#31504

h-vetinari commented 1 year ago

Yeah, but compatibility with modern bazel is mostly just a question of time. The important update here IMO is the new capabilities that'll allow to finally improve the (un)vendoring situation here.

anyscalesam commented 10 months ago

@mattip is there anything preventing an update of ray to 2.9.0. that should bring grpcio version to 1.59

EDIT: we should guard at <1.59 not pin it.

mattip commented 1 month ago

I found this article about using native libaries in bazel builds. It seems we could add some patches to replace the grpc and protobuf builds with the conda-provided ones?

mattip commented 1 month ago

I think we could try to use these from conda: libabseil (instead of @com_google_absl/*), 'gprc' (instead of @com_github_grpc_grpc/*) 'protobuf' (instead of @com_google_protobuf), all from upstream BUILD.bazel