conda-forge / tensorflow-feedstock

A conda-smithy repository for tensorflow.
BSD 3-Clause "New" or "Revised" License
92 stars 81 forks source link

Rebuild for libabseil 20240116, libgrp 1.61, libprotobuf 4.25.2 #372

Closed regro-cf-autotick-bot closed 4 months ago

regro-cf-autotick-bot commented 7 months ago

This PR has been triggered in an effort to update libabseil20240116_libgrpc161_libprotobuf4252.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/8032512822, please use this URL for debugging.

conda-forge-webservices[bot] commented 7 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

hmaarrfk commented 7 months ago

I haven't debugged yet, but:


# Configuration: 6f0a7a5236083d91337c69cd04dac9364399136f095748b9de00939014916111                          
# Execution platform: @local_execution_config_platform//:platform                                          
external/local_xla/xla/runtime/constraints.cc:30:13: error: 'StrCat' has not been declared in 'absl'       
   30 | using absl::StrCat;
      |             ^~~~~~
external/local_xla/xla/runtime/constraints.cc: In function 'absl::lts_20240116::StatusOr<xla::runtime::ArgumentConstraint> xla::runtime::ParseArgumentConstraint(std::string_view)':
external/local_xla/xla/runtime/constraints.cc:36:31: error: 'StrCat' was not declared in this scope; did you mean 'strcat'?
   36 |   return InvalidArgumentError(StrCat("unknown operand constraint: ", str));                        
      |                               ^~~~~~
      |                               strcat
INFO: Elapsed time: 1684.012s, Critical Path: 221.41s                                                      
INFO: 10426 processes: 2128 internal, 8298 local.
FAILED: Build did NOT complete successfully

``
h-vetinari commented 7 months ago

That's a simple header-inclusion issue - I think abseil got rid of a few transitive imports. For using absl::StrCat, one needs to

#include "absl/strings/str_cat.h"

Unsurprisingly, this got noticed upstream as well, so we can just backport https://github.com/tensorflow/tensorflow/commit/a733cb11912d455b8eef3437e526064642444390

hmaarrfk commented 7 months ago

Again, i haven't troubleshot at all:

ux/lib/stubs -DNDEBUG '-D_FORTIFY_SOURCE=2' -O2 -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_plac
ehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh
old_plac/include -I/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_pl
acehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/targets/x86_64-linux/include 
-I/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_build_env/targets/x86_64-linux/include -L/home/conda/feedstock_root/build_a
rtifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place
hold_placehold_placehold_placehold_placehold_placehold_placehold_plac/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/tensorflow-s
plit_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh
old_placehold_placehold_placehold_placehold_plac/targets/x86_64-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194
649/_build_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_build_env/targets/x86_64-linux/lib/
stubs -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result'
 -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS '-std=c++17' -x cuda '-DGOOGLE_CUDA=1' '--cuda-gpu-arch=sm_60
' '--cuda-gpu-arch=sm_70' '--cuda-gpu-arch=sm_75' '--cuda-gpu-arch=sm_80' '--cuda-gpu-arch=sm_86' '--cuda-gpu-arch=sm_89' '--cuda-gpu-arch=sm_90' '--cud
a-include-ptx=sm_90' '--cuda-gpu-arch=sm_90' '-Xcuda-fatbinary=--compress-all' '-nvcc_options=expt-relaxed-constexpr' -DEIGEN_AVOID_STL_ARRAY -Iexternal
/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DGOOGLE_CUDA=1' '-DTENSORFLOW_USE_NVCC=1' '-DTENSORFLOW_USE_XLA=1' -DINTEL_MKL -DEN
ABLE_ONEDNN_V3 -DAMD_ZENDNN -msse3 -pthread '-nvcc_options=relaxed-constexpr' '-nvcc_options=ftz=true' -c tensorflow/core/kernels/image/image_ops_gpu.cu
.cc -o bazel-out/k8-opt/bin/tensorflow/core/kernels/image/_objs/image_ops_gpu/image_ops_gpu.cu.pic.o)
# Configuration: 847f9e57a1c77ac4a7da68b1c95e1fd0256c7d92848d94facaf2dc7f3de903ec
# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_cat.h: In function 'std
::enable_if_t<(sizeof... (T) > 1), typename std::common_type<typename std::conditional<true, void, absl::lts_20240116::strings_internal::EnableIfFastCas
e<T, typename std::enable_if<((std::is_arithmetic<T>::value && (! std::is_same<T, char>::value)) && (! std::is_same<T, bool>::value)), void>::type> >::t
ype ...>::type> absl::lts_20240116::StrAppend(absl::Nonnull<T*>, T ...)':
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_cat.h:606:316: error: e
xpected ';' before '...' token
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                               

                   ^  
      |                               

                   ;
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_cat.h:606:43: error: pa
rameter packs not expanded with '...': 
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^                         

/home/conda/feedstock_root/build_artifacts/tensorflow-split_1708916194649/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p
lacehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_cat.h:606:43: note:    
     'args'
INFO: Elapsed time: 1170.648s, Critical Path: 430.78s
INFO: 11091 processes: 5068 internal, 6023 local.
FAILED: Build did NOT complete successfully

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
h-vetinari commented 7 months ago

So I redid the backport to double-check, and I basically have the same as you (only that I wouldn't have kept the include of stable_delegate.h in tensorflow/lite/delegates/utils/experimental/stable_delegate/delegate_loader.cc).

Given that the error is happening in the abseil headers itself, and the definition there calls it "ugly code" and "poor man's C++17 fold expression for C++14", I'm kinda thinking compiler error?

Looking more closely at the history of that code, I found that it was introduced recently, and in fact, someone else had already noted that breakage. There's now an issue to track this: https://github.com/abseil/abseil-cpp/issues/1629

PS. That stacktrace is very borked w.r.t. line breaks (and chopping off the note: section at the end), which made it harder to write a good bug report.

hmaarrfk commented 7 months ago

i don't know enough about C++17 fold expressions, but maybe I can try chatgpt's code:

// Assuming 'args' is a parameter pack and other variables are previously defined.
([&](auto&& arg) {
    (void)(n = lengths[i]),
    (void)(n < 0 ? (void)(*pos++ = '-'), (n = ~n) : 0),
    (void)absl::numbers_internal::FastIntToBufferBackward(
        absl::numbers_internal::UnsignedAbsoluteValue(std::forward<decltype(arg)>(arg)),
        pos += n, static_cast<uint32_t>(n)),
    (void)++i;
} (args), ...); // Pack expansion within a lambda expression, applied to each 'args'
hmaarrfk commented 7 months ago

Hmm, i can't recreate the failure in interactive mode.... I get the following failure before I get to it:

ERROR: /home/conda/feedstock_root/build_artifacts/debug_1708918309173/work/tensorflow/compiler/tf2xla/cc/BUILD:10:21: Executing genrule //tensorflow/compiler/tf2xla/cc:xla_ops_gen_genrule failed: (Exit 127): bash failed: error executing command (from target //tensorflow/compiler/tf2xla/cc:xla_ops_gen_genrule) 
  (cd /home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/share/bazel/c7c77d10841edda30d34d2f7e223d64b/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/work:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin \
    PYTHON_BIN_PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin/python \
    PYTHON_LIB_PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/lib/python3.10/site-packages \
    TF2_BEHAVIOR=1 \
    TF_SYSTEM_LIBS=astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_absl,com_google_protobuf,curl,cython,dill_archive,flatbuffers,gast_archive,gif,icu,libjpeg_turbo,org_sqlite,png,pybind11,snappy,zlib \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops_gen_cc bazel-out/k8-opt/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops.h bazel-out/k8-opt/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops.cc 0 ,')
# Configuration: d09bb988379bc3768b3dd1f7d02e8925deb2eb8e10a590ffd28330756d700f55
# Execution platform: @local_execution_config_platform//:platform
bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_ops_gen_cc: symbol lookup error: /home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/share/bazel/c7c77d10841edda30d34d2f7e223d64b/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/../../../../../_solib_k8/_U_S_Stensorflow_Scompiler_Stf2xla_Scc_Cops_Sxla_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so.2: undefined symbol: _ZN4absl12lts_2024011612log_internal8VLogSite14SlowIsEnabled2Ei
ERROR: /home/conda/feedstock_root/build_artifacts/debug_1708918309173/work/tensorflow/compiler/tf2xla/cc/BUILD:32:21: Executing genrule //tensorflow/compiler/tf2xla/cc:xla_jit_op_gen_genrule failed: (Exit 127): bash failed: error executing command (from target //tensorflow/compiler/tf2xla/cc:xla_jit_op_gen_genrule) 
  (cd /home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/share/bazel/c7c77d10841edda30d34d2f7e223d64b/execroot/org_tensorflow && \
  exec env - \
    PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/work:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin \
    PYTHON_BIN_PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/bin/python \
    PYTHON_LIB_PATH=/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_h_env/lib/python3.10/site-packages \
    TF2_BEHAVIOR=1 \
    TF_SYSTEM_LIBS=astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_absl,com_google_protobuf,curl,cython,dill_archive,flatbuffers,gast_archive,gif,icu,libjpeg_turbo,org_sqlite,png,pybind11,snappy,zlib \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_jit_ops_gen_cc bazel-out/k8-opt/bin/tensorflow/compiler/tf2xla/cc/ops/xla_jit_ops.h bazel-out/k8-opt/bin/tensorflow/compiler/tf2xla/cc/ops/xla_jit_ops.cc 1 ,')
# Configuration: d09bb988379bc3768b3dd1f7d02e8925deb2eb8e10a590ffd28330756d700f55
# Execution platform: @local_execution_config_platform//:platform
bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/xla_jit_ops_gen_cc: symbol lookup error: /home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/share/bazel/c7c77d10841edda30d34d2f7e223d64b/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/tf2xla/cc/ops/../../../../../_solib_k8/_U_S_Stensorflow_Scompiler_Stf2xla_Scc_Cops_Sxla_Ujit_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so.2: undefined symbol: _ZN4absl12lts_2024011612log_internal8VLogSite14SlowIsEnabled2Ei
INFO: Elapsed time: 8.639s, Critical Path: 0.71s
h-vetinari commented 7 months ago

undefined symbol: _ZN4absl12lts_2024011612log_internal8VLogSite14SlowIsEnabled2Ei

That lib needs to do -labsl_log_internal.

hmaarrfk commented 7 months ago

-labsl_vlog_config_internal is probably the correct one, the one you pointed to doesn't exist.

hmaarrfk commented 7 months ago

A few things (third party) seem to link to absl_flags which seems to have been split up into a few smaller libraries.

/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -labsl_flags: No such file or directory
h-vetinari commented 7 months ago

A few things (third party) seem to link to absl_flags which seems to have been split up into a few smaller libraries.

It got renamed to absl_log_flags for some reason

h-vetinari commented 7 months ago

@hmaarrfk, I'm having trouble reproducing the failure you got w.r.t. error: parameter packs not expanded with '...', which is important for moving the issue forward on abseil side. Could you tell me which CUDA version you're using?

With the following reduced code (from upstream issue)

test.cpp ```C #include #include #include #include #include template auto foo(T v, U...) { return v; } template auto bar(T v) { return v; } template void StrAppend(String* str, T... args) { size_t old_size = 0; size_t i = 0; ptrdiff_t n; typename String::pointer pos = &(*str)[old_size]; using SomeTrivialEmptyType = std::false_type; ptrdiff_t lengths[sizeof...(T)] = {}; const SomeTrivialEmptyType dummy1; for (const SomeTrivialEmptyType& dummy2 : {((void)(n = lengths[i]), (void)(n < 0 ? (void)(*pos++ = '-'), (n = ~n) : 0), (void)foo( bar(std::move(args)), pos += n, static_cast(n)), (void)++i, dummy1)...}) { (void)dummy2; } } int main() { std::string a; StrAppend(&a, 1); } ```

I tried both CUDA 11.2

docker run --rm -it http://quay.io/condaforge/linux-anvil-cuda:11.2
mamba install gxx_linux-64 gxx -y
nvcc --std=c++17 ./test.cpp

as well as CUDA 12

docker run --rm -it http://quay.io/condaforge/linux-anvil-cos7-x86_64
mamba install cuda-nvcc_linux-64=12.0 gxx_linux-64 gxx -y
nvcc --std=c++17 ./test.cpp

and both compiled and ran without issue.

hmaarrfk commented 7 months ago

I'm having trouble reproducing the failure

I'm really trying to recreate it.... maybe it was an error in my server......

h-vetinari commented 7 months ago

I don't doubt that it's real, as at least one other person ran into the same issue. It seems there's some sort of more complicated interaction happening, but no idea. Worst case you can always restart a full rebuild, or we try a single build on the cirun server

hmaarrfk commented 7 months ago

FYI just restarting my the builds now with linking absl_log_flags.

14k / 22k files compiled.

hmaarrfk commented 7 months ago

Sigh it never gets easier

# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: bazel-out/k8-opt/bin/external/local_xla/xla/service/libslow_operation_alarm.pic.a(slow_operation_alarm.pic.o): in function `xla::SlowOperationAlarm::AlarmLoop()':
slow_operation_alarm.cc:(.text._ZN3xla18SlowOperationAlarm9AlarmLoopEv+0x21c): undefined reference to `absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'

For note, i probably was trying to build cuda 12.0 but i can't remember.

h-vetinari commented 7 months ago

undefined reference to 'absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'

The good thing with a abseil is that it's relatively easy to read off the library in which this should be available: libabsl_synchronization_internal

hmaarrfk commented 7 months ago

image I feel like it is already linking to the correct library.... image

hmaarrfk commented 7 months ago

error.log

updated file with actual content

hmaarrfk commented 7 months ago

I'm having trouble applying patches and getting this to run cleanly.

I'm going to revert this and work on a branch

0021-Link-to-absl_log_flags-instead-of-absl_flags.patch.txt 0022-Fix-protobuf-errors-when-using-system-protobuf.patch.txt 0023-Fix-missing-includes-needed-for-Abseil-lts_2024_01_1.patch.txt

hmaarrfk commented 7 months ago

@hmaarrfk, I'm having trouble reproducing the failure you got w.r.t. error: parameter packs not expanded with '...', which is important for moving the issue forward on abseil side. Could you tell me which CUDA version you're using?

linux_64_c_compiler_version12cuda_compiler_version12.0cxx_compiler_version12numpy1.23python3.11.____cpython.log

CPU_COUNT=30 python build-locally.py linux_64_c_compiler_version12cuda_compiler_version12.0cxx_compiler_version12numpy1.23python3.11.____cpython | tee linux_64_c_compiler_version12cuda_compiler_version12.0cxx_compiler_version12numpy1.23python3.11.____cpython.log
h-vetinari commented 7 months ago

So that nvcc-bug was now worked around in abseil, and I've backported it to our build.

hmaarrfk commented 7 months ago

k restarting the build

xhochy commented 7 months ago

Using 0021-Link-to-absl_log_flags-instead-of-absl_flags.patch, I could get this passing on osx-arm64.

hmaarrfk commented 7 months ago

My build locally tests were failing last night. Couldn’t figure out why

hmaarrfk commented 7 months ago

@xhochy I'm still getting an error similar to:

ehold_placehold_placehold_plac/lib -lrt)                                                                                                                                                                              
# Configuration: fa4ce526dd5f122a8767cb1151666e6310dddb4c2cfbe826792aff51852e739a                                                                                                                                     
# Execution platform: @local_execution_config_platform//:platform                                                                                                                                                     
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1709519703250/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: bazel-out/k8-opt/bin/external/local_xla/x
la/service/libslow_operation_alarm.pic.a(slow_operation_alarm.pic.o): in function `xla::SlowOperationAlarm::AlarmLoop()':                                                                                             
slow_operation_alarm.cc:(.text._ZN3xla18SlowOperationAlarm9AlarmLoopEv+0x21c): undefined reference to `absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'          
collect2: error: ld returned 1 exit status                                                                                                                                                                            
INFO: Elapsed time: 3859.256s, Critical Path: 640.95s 

did you do anything to work around that?