Closed hmaarrfk closed 8 months ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
So this is now failing for cuda 12.0 with the following traceback:
Compiling mlir/lib/IR/BuiltinAttributes.cpp; 16s local
Compiling mlir/lib/IR/AsmPrinter.cpp; 14s local
ERROR: /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/share/bazel/24c4c7ed151f9291444c714bbf617dc0/external/local_xla/xla/service/gpu/runtime/BUILD:303:13: Compiling xla/service/gpu/runtime/topk_kernel_float.cu.cc failed: (Exit 127): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @local_xla//xla/service/gpu/runtime:topk_kernel_cuda)
(cd /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/share/bazel/24c4c7ed151f9291444c714bbf617dc0/execroot/org_tensorflow && \
exec env - \
CUDNN_INSTALL_PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env \
GCC_HOST_COMPILER_PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/bin/x86_64-conda-linux-gnu-gcc \
NCCL_INSTALL_PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env \
PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/work:/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/bin:/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin \
PWD=/proc/self/cwd \
PYTHON_BIN_PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/bin/python \
PYTHON_LIB_PATH=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/lib/python3.11/site-packages \
TF2_BEHAVIOR=1 \
TF_CUDA_COMPUTE_CAPABILITIES=sm_60,sm_70,sm_75,sm_80,sm_86,sm_89,sm_90,compute_90 \
TF_CUDA_PATHS=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux,/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux \
TF_CUDA_VERSION=12.0 \
TF_CUDNN_VERSION=8 \
TF_NCCL_VERSION=2.19 \
TF_SYSTEM_LIBS=astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_absl,com_google_protobuf,curl,cython,dill_archive,flatbuffers,gast_archive,gif,icu,libjpeg_turbo,org_sqlite,png,pybind11,snappy,zlib \
custom_toolchain/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/runtime/_objs/topk_kernel_cuda/topk_kernel_float.cu.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/runtime/_objs/topk_kernel_cuda/topk_kernel_float.cu.pic.o' -fPIC -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' '-DBAZEL_CURRENT_REPOSITORY="local_xla"' -iquote external/local_xla -iquote bazel-out/k8-opt/bin/external/local_xla -iquote . -iquote bazel-out/k8-opt/bin -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -isystem third_party/eigen3/mkl_include -isystem bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/include '-march=nocona' '-mtune=haswell' -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/include '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/work=/usr/local/src/conda/tensorflow-split-2.15.0' '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env=/usr/local/src/conda-prefix' -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/include -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG '-D_FORTIFY_SOURCE=2' -O2 -isystem /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/include -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib/stubs -fvisibility-inlines-hidden '-fmessage-length=0' '-march=nocona' '-mtune=haswell' -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/include '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/work=/usr/local/src/conda/tensorflow-split-2.15.0' '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env=/usr/local/src/conda-prefix' -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/include -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib/stubs -DNDEBUG '-D_FORTIFY_SOURCE=2' -O2 -isystem /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include -I/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/include -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/lib/stubs -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib -L/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/targets/x86_64-linux/lib/stubs -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS '-std=c++17' -x cuda '-DGOOGLE_CUDA=1' '--cuda-gpu-arch=sm_60' '--cuda-gpu-arch=sm_70' '--cuda-gpu-arch=sm_75' '--cuda-gpu-arch=sm_80' '--cuda-gpu-arch=sm_86' '--cuda-gpu-arch=sm_89' '--cuda-gpu-arch=sm_90' '--cuda-include-ptx=sm_90' '--cuda-gpu-arch=sm_90' '-Xcuda-fatbinary=--compress-all' '-nvcc_options=expt-relaxed-constexpr' -c external/local_xla/xla/service/gpu/runtime/topk_kernel_float.cu.cc -o bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/runtime/_objs/topk_kernel_cuda/topk_kernel_float.cu.pic.o)
# Configuration: 774eb4db7b06dde1dbdd80b8c942fcf09c18136defcdb07a294cb4a50b4c1302
# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/share/bazel/24c4c7ed151f9291444c714bbf617dc0/execroot/org_tensorflow/custom_toolchain/crosstool_wrapper_driver_is_not_gcc:48: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
import pipes
external/local_xla/xla/service/gpu/runtime/topk_kernel.cu.h(38): warning #20012-D: __device__ annotation is ignored on a function("KVT") that is explicitly defaulted on its first declaration
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
external/local_xla/xla/service/gpu/runtime/topk_kernel.cu.h(39): warning #20012-D: __device__ annotation is ignored on a function("operator=") that is explicitly defaulted on its first declaration
external/local_xla/xla/service/gpu/runtime/topk_kernel.cu.h(40): warning #20012-D: __device__ annotation is ignored on a function("operator=") that is explicitly defaulted on its first declaration
external/local_xla/xla/service/gpu/runtime/topk_kernel.cu.h(41): warning #20012-D: __device__ annotation is ignored on a function("KVT") that is explicitly defaulted on its first declaration
external/local_xla/xla/service/gpu/runtime/topk_kernel.cu.h(42): warning #20012-D: __device__ annotation is ignored on a function("KVT") that is explicitly defaulted on its first declaration
sh: cicc: command not found
INFO: Elapsed time: 1372.328s, Critical Path: 241.29s
INFO: 14555 processes: 6780 internal, 7775 local.
FAILED: Build did NOT complete successfully
I got past the other problem, but now:
INFO: Found applicable config definition build:dynamic_kernels in file /home/conda/feedstock_root/build_artifacts/debug_1703973707619/work/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: The following configs were expanded more than once: [noaws]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Build options --define and --extra_toolchains have changed, discarding analysis cache.
WARNING: Download from https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/081771d4a0e9d7d3aa0eed2ef389fa4700dfb23e.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
INFO: Analyzed 3 targets (4 packages loaded, 42616 targets configured).
INFO: Found 3 targets...
ERROR: /home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/share/bazel/24c4c7ed151f9291444c714bbf617dc0/external/local_xla/xla/service/gpu/runtime/BUILD:45:13: Compiling xla/service/gpu/runtime/sleep_kernel.cu.cc failed: undeclared inclusion(s) in rule '@local_xla//xla/service/gpu/runtime:sleep_kernel_cuda':
this rule is missing dependency declarations for the following files included by 'xla/service/gpu/runtime/sleep_kernel.cu.cc':
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/cuda_runtime.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/builtin_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/device_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/driver_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/vector_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/surface_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/texture_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/library_types.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/channel_descriptor.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/cuda_runtime_api.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/cuda_device_runtime_api.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/driver_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/vector_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/vector_functions.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/device_atomic_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/device_atomic_functions.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_20_atomic_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_20_atomic_functions.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_32_atomic_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_32_atomic_functions.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_35_atomic_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_60_atomic_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_60_atomic_functions.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_20_intrinsics.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_20_intrinsics.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_30_intrinsics.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_30_intrinsics.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_32_intrinsics.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_32_intrinsics.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_35_intrinsics.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_61_intrinsics.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/sm_61_intrinsics.hpp'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/texture_indirect_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/surface_indirect_functions.h'
'/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_h_env/targets/x86_64-linux/include/device_launch_parameters.h'
/home/conda/feedstock_root/build_artifacts/debug_1703973707619/_build_env/share/bazel/24c4c7ed151f9291444c714bbf617dc0/execroot/org_tensorflow/custom_toolchain/crosstool_wrapper_driver_is_not_gcc:48: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
import pipes
INFO: Elapsed time: 635.592s, Critical Path: 212.82s
INFO: 2906 processes: 52 internal, 2854 local.
FAILED: Build did NOT complete successfully
I got past the other problem, but now:
I got it to work by removing this line: https://github.com/conda-forge/tensorflow-feedstock/blob/b7f9d4c9a221fb936d58c5547435de15a39ca13d/recipe/build.sh#L110-L111
The error message shows the compiler uses the headers in $PREFIX
. However, TF_CUDA_PATHS
is set to ${BUILD_PREFIX}
, so it is due to the copy of headers.
https://github.com/conda-forge/tensorflow-feedstock/blob/b7f9d4c9a221fb936d58c5547435de15a39ca13d/recipe/build.sh#L109
Thank you for your work!!!!
from:
export TF_CUDA_PATHS="${BUILD_PREFIX}/targets/x86_64-linux,${PREFIX}/targets/x86_64-linux"
did you have to remove the BUILD_PREFIX
entry?
i might be missing something, but I still got the same error and i removed the rsync command you mentioned.
If you could show your diff or make a PR that would be reat.
For reference, I am building python 3.11 + cuda 12.0
I just removed rsync and did nothing else. I also built Py 3.11+CUDA 12. The bazel was complete:
INFO: Elapsed time: 22696.720s, Critical Path: 576.66s
INFO: 16404 processes: 993 internal, 15411 local.
INFO: Build completed successfully, 16404 total action
s
I will add a new PR and show the complete log.
I also built Py 3.11+CUDA 12
Sorry, I just realized I made a mistake here after I rechecked the build log. I built a CPU package in the last retry.
Now it complains that cusparse.h
(in $PREFIX) is not found. TensorFlow might assume cusparse to be installed in the same directory as the nvcc.
I will try to move some CUDA packages from host
to build
and see if it works.
For cuda 11.8 I'm getting abseil errors of the form:
home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h: In constructor 'absl::lts_20230802
::str_format_internal::FormatSpecTemplate<Args>::FormatSpecTemplate(const absl::lts_20230802::str_format_internal::ExtendedParsedFormat<absl::lts_2023080
2::FormatConversionCharSet(C)...>&)':
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:172:1: error: parse error in templa
te argument list
172 | CheckArity<sizeof...(C), sizeof...(Args)>();
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:172:63: error: expected ';' before
')' token
172 | CheckArity<sizeof...(C), sizeof...(Args)>();
| ^
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:173:147: error: template argument 1
is invalid
173 | CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
|
^
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:173:151: error: expected primary-ex
pression before '{' token
173 | CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
|
^
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:173:151: error: expected ';' before
'{' token
/home/conda/feedstock_root/build_artifacts/debug_1704559020594/_h_env/include/absl/strings/internal/str_format/bind.h:173:153: error: expected primary-ex
pression before ')' token
173 | CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
I'm somewhat hacking around them.
Maybe we need to change which GCC version is used with the CUDA 12 build. If so, something like this could work
For CUDA 12, after I applied 82e49ff2108882b2114d160fcf4b352ec5325b89, the old errors disappeared.
Bazel hasn't finished the building ([14,551 / 18,108]
), and I will upload the complete build.log
after it finishes.
Unfortunate I don’t fully know where the libraries should go. It feels wrong considering cross compulation
Yeah libraries should go in host
Edit: Likely something more is needed on the configuration side or there could be issues in TensorFlow's build logic
Yeah libraries should go in
host
Edit: Likely something more is needed on the configuration side or there could be issues in TensorFlow's build logic
In theory yes, but it would be quite hard to make changes to TensorFlow if it has something wrong
For the abseil error: I see someone had the same issue in https://github.com/tensorflow/tensorflow/issues/62081 with abseil lts_20230125
For the abseil error: I see someone had the same issue in https://github.com/tensorflow/tensorflow/issues/62081 with abseil lts_20230125
I feel like @xochy was dealing with them with patches. But I am just not as good as they are...
With https://github.com/conda-forge/tensorflow-feedstock/pull/367 we get a build for CUDA 12 but even with the sed hacks, 11.8 fails for mw with abseil errors.
Closed in favor of #367
xref: https://github.com/conda-forge/tensorflow-feedstock/issues/365
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)