conda-forge / tensorflow-feedstock

A conda-smithy repository for tensorflow.
BSD 3-Clause "New" or "Revised" License
91 stars 81 forks source link

Rebuild for libprotobuf 4.25.1 #371

Closed regro-cf-autotick-bot closed 1 month ago

regro-cf-autotick-bot commented 5 months ago

This PR has been triggered in an effort to update libprotobuf4251.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/7730760900, please use this URL for debugging.

conda-forge-webservices[bot] commented 5 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

hmaarrfk commented 5 months ago

I've started to attempt to build linux (all variants). @xhochy are you able to try to build osx?

hmaarrfk commented 5 months ago

sorry i jumped on this too soon. this needs bazel which needs https://github.com/conda-forge/grpc_java_plugin-feedstock/pull/77

hmaarrfk commented 5 months ago

build_artifacts/linux_64_c_compiler_version12cuda_compiler_version12.0cxx_compiler_version12numpy1.22python3.10.____cpython-log.txt

# Configuration: 05589994264a0a98ef13eb319f22ab690b1c7378d358149ffda9fd6f64359d80
# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1706810202515/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/google/protobuf/map_field.h:682:1: error: conflicting declaration 'google::protobuf::internal::MapFieldBase::VTable google::protobuf::internal::MapField<Derived, Key, T, key_wire_type, value_wire_type>::kVTable'
  682 |     MapField<Derived, Key, T, kKeyFieldType_, kValueFieldType_>::kVTable =
      | ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1706810202515/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/google/protobuf/map_field.h:671:45: note: previous declaration as 'const google::protobuf::internal::MapFieldBase::VTable google::protobuf::internal::MapField<Derived, Key, T, key_wire_type, value_wire_type>::kVTable'
  671 |   static const MapFieldBase::VTable kVTable;
      |                                             ^      
INFO: Elapsed time: 689.017s, Critical Path: 270.85s

linux_64_c_compiler_version12cuda_compiler_version12.0cxx_compiler_version12numpy1.22python3.10.____cpython-log.txt

hmaarrfk commented 5 months ago

It seems that the discussion in https://github.com/conda-forge/gazebo-feedstock/pull/201

and the suggested fix: https://github.com/protocolbuffers/protobuf/commit/bb4718cd8720c6acb17b58b19f8345c94aaaba78 is helping

hmaarrfk commented 5 months ago

Even with "fixing" libprotobuf, we still fail with the build with:

[16,995 / 28,129] 32 actions, 20 running
[16,995 / 28,129] 32 actions, 21 running
ERROR: /home/conda/feedstock_root/build_artifacts/debug_1706841931849/work/tensorflow/compiler/tf2xla/cc/BUILD:10:21: Executing genrule //tensorflow/compiler/tf2xla/cc:xla_ops_gen_genrule fa
iled: (Exit 127): bash failed: error executing command (from target //tensorflow/compiler/tf2xla/cc:xla_ops_gen_genrule)
  (cd /home/conda/feedstock_root/build_artifacts/debug_1706841931849/_build_env/share/bazel/f16ca41df86c180677b057e9271b19b7/execroot/org_tensorflow && \
  exec env - \
    CUDNN_INSTALL_PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env \
    GCC_HOST_COMPILER_PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_build_env/bin/x86_64-conda-linux-gnu-gcc \
    NCCL_INSTALL_PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env \
    PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/work:/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_build_env/bin:/home/conda/feedstock_root/build_ar
tifacts/debug_1706841931849/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env/bin
:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin \
    PYTHON_BIN_PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env/bin/python \
    PYTHON_LIB_PATH=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env/lib/python3.10/site-packages \
    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=sm_60,sm_70,sm_75,sm_80,sm_86,sm_89,sm_90,compute_90 \
    TF_CUDA_PATHS=/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_build_env/targets/x86_64-linux,/home/conda/feedstock_root/build_artifacts/debug_1706841931849/_h_env/targets
/x86_64-linux \
    TF_CUDA_VERSION=12.0 \
    TF_CUDNN_VERSION=8 \
    TF_NCCL_VERSION=2.19 \
    TF_SYSTEM_LIBS=astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_absl,com_google_protobuf,curl,cython,dill_archiv
e,flatbuffers,gast_archive,gif,icu,libjpeg_turbo,org_sqlite,png,pybind11,snappy,zlib \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops_gen_cc bazel-out/k8-opt/bin/tensorflo
w/compiler/tf2xla/cc/ops/xla_ops.h bazel-out/k8-opt/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops.cc 0 ,')
# Configuration: 04af7348d9f57670bca9a21f9bc81efc2a21a13a3a3ffb065ec8dc4e49939bfe
# Execution platform: @local_execution_config_platform//:platform
bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops_gen_cc: symbol lookup error: /home/conda/feedstock_root/build_artifacts/debug_1706841931849/_build_env/share/baze
l/f16ca41df86c180677b057e9271b19b7/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/../../../../../_solib_k8/_U_S_Stensorflow_Scompiler_Stf2xla_Sc
c_Cops_Sxla_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so.2: undefined symbol: _ZN7riegeli15RecordsMetadata5ClearEv
INFO: Elapsed time: 4420.909s, Critical Path: 474.97s
INFO: 18603 processes: 3839 internal, 14764 local.
FAILED: Build did NOT complete successfully
h-vetinari commented 5 months ago

Key part here being:

libtensorflow_framework.so.2: undefined symbol: _ZN7riegeli15RecordsMetadata5ClearEv
                                                    ^^^^^^^
                                                    look here

This is vendored by tensorflow here, coming from https://github.com/google/riegeli originally. I see that tensorflow applies a patch to riegeli (and by building through bazel, we should be doing the same), but it doesn't look particularly related.

hmaarrfk commented 5 months ago

Thanks!! I need to presently (for the next few hours) focus on other things. Glad others (like yourself!!!) are able to help while before I completely delete the logs. Even having hidden tmux panes makes me stressed…..

h-vetinari commented 5 months ago

Do you want me to hold off with merging libprotobuf v25.2 until we've fixed tensorflow (in case we need more patches to v25.1)?

Otherwise I'm planning to go ahead and trigger a bigger migration (abseil/grpc/protobuf).

hmaarrfk commented 5 months ago

Go for it.

h-vetinari commented 4 months ago

There's now a PR for a newer protobuf version: #372; if we're lucky the error has been resolved?