Open clxie opened 2 years ago
This same issue was fixed in Google's tensorflow source tree, and I've created a pull request that fixes it in this repository. In the meantime, you can simply copy the FixedPoint library from the 2.x code into your local copy of the nvidia-tensorflow repository, and it will fix the compile error and build successfully. I used this to build against the 470 driver, CUDA 11.4, CUDNN 8.2 with gcc 9.3 on ubuntu 20.04. Installed with pip, will run TF 1.x scripts on my 30x0 card. Been trying for over two weeks to get any 1.x working on a 30x0. Thank you SO SO SO SO MUCH NVIDIA for this version of TF !! Bless you !!
This same issue was fixed in Google's tensorflow source tree, and I've created a pull request that fixes it in this repository. In the meantime, you can simply copy the FixedPoint library from the 2.x code into your local copy of the nvidia-tensorflow repository, and it will fix the compile error and build successfully. I used this to build against the 470 driver, CUDA 11.4, CUDNN 8.2 with gcc 9.3 on ubuntu 20.04. Installed with pip, will run TF 1.x scripts on my 30x0 card. Been trying for over two weeks to get any 1.x working on a 30x0. Thank you SO SO SO SO MUCH NVIDIA for this version of TF !! Bless you !!
Hi,KenBot As your suggestions, build errors gone. Thank you so so so much~!
Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template
System information
GPU model and memory:A100 / 40G
export TF_NEED_CUDA=1 export TF_NEED_TENSORRT=1 export TF_TENSORRT_VERSION=8 export TF_CUDA_PATHS=/usr,/usr/local/cuda export TF_CUDA_VERSION=11.5 export TF_CUBLAS_VERSION=11 export TF_CUDNN_VERSION=8 export TF_NCCL_VERSION=2 export TF_CUDA_COMPUTE_CAPABILITIES="7.0,8.0" export TF_ENABLE_XLA=1 export TF_NEED_HDFS=0 export CC_OPT_FLAGS="-march=native -mtune=native"
Describe the problem ERROR: ./tensorflow/core/kernels/BUILD:788:1: C++ compilation of rule '//tensorflow/core/kernels:eigen_contraction_kernel_with_mkl' failed (Exit 1) In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35, from ./tensorflow/core/kernels/eigen_contraction_kernel.h:39, from tensorflow/core/kernels/eigen_contraction_kernel.cc:16: ./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h: In function ‘typename Eigen::internal::unpacket_traits::type Eigen::internal::predux_min(const Packet&) [with Packet = Eigen::internal::Packet16q32i; typename Eigen::internal::unpacket_traits::type = Eigen::QInt32]’:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:432:16: error: could not convert ‘Eigen::internal::pfirst<vector(2) long long int>(_mm_min_epi32(res.Eigen::internal::eigen_packet_wrapper<vector(2) long long int, 0>::operator vector(2) long long int&(), _mm_shuffle_epi32(res.Eigen::internal::eigen_packet_wrapper<vector(2) long long int, 0>::operator __vector(2) long long int&(), ((((0 << 6) | (0 << 4)) | (0 << 2)) | 1))))’ from ‘Eigen::internal::unpacket_traits<vector(2) long long int>::type’ {aka ‘vector(2) long long int’} to ‘Eigen::QInt32’
return pfirst(
Provide the exact sequence of commands / steps that you executed before running into the problem (base) $ yes ""| ./configure WARNING: Output base './.cache/bazel/_bazel_chunxie/b3a1696304cedbe15049d6664790de6a' is on NFS. This may lead to surprising failures and undetermined behavior. WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.25.3 installed. Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths: /usr/lib64/python3.6/site-packages /usr/local/lib64/python3.6/site-packages /usr/local/lib/python3.6/site-packages /usr/lib/python3.6/site-packages Please input the desired Python library path to use. Default is [/usr/lib64/python3.6/site-packages] Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow.
Found CUDA 11.5 in: /usr/local/cuda/lib64 /usr/local/cuda/include Found cuDNN 8 in: /usr/lib64 /usr/include Found TensorRT 8 in: /usr/lib64 /usr/include Found NCCL 2 in: /usr/lib64 /usr/include Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow.
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. --config=v2 # Build TensorFlow 2.x instead of 1.x. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apache Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished
GPU model and memory:A100 / 40G
export TF_NEED_CUDA=1 export TF_NEED_TENSORRT=1 export TF_TENSORRT_VERSION=8 export TF_CUDA_PATHS=/usr,/usr/local/cuda export TF_CUDA_VERSION=11.5 export TF_CUBLAS_VERSION=11 export TF_CUDNN_VERSION=8 export TF_NCCL_VERSION=2 export TF_CUDA_COMPUTE_CAPABILITIES="7.0,8.0" export TF_ENABLE_XLA=1 export TF_NEED_HDFS=0 export CC_OPT_FLAGS="-march=native -mtune=native"
Describe the problem ERROR: ./tensorflow/core/kernels/BUILD:788:1: C++ compilation of rule '//tensorflow/core/kernels:eigen_contraction_kernel_with_mkl' failed (Exit 1) In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint:35, from ./tensorflow/core/kernels/eigen_contraction_kernel.h:39, from tensorflow/core/kernels/eigen_contraction_kernel.cc:16: ./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h: In function ‘typename Eigen::internal::unpacket_traits::type Eigen::internal::predux_min(const Packet&) [with Packet = Eigen::internal::Packet16q32i; typename Eigen::internal::unpacket_traits::type = Eigen::QInt32]’:
./third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX512.h:432:16: error: could not convert ‘Eigen::internal::pfirst<vector(2) long long int>(_mm_min_epi32(res.Eigen::internal::eigen_packet_wrapper<vector(2) long long int, 0>::operator vector(2) long long int&(), _mm_shuffle_epi32(res.Eigen::internal::eigen_packet_wrapper<vector(2) long long int, 0>::operator __vector(2) long long int&(), ((((0 << 6) | (0 << 4)) | (0 << 2)) | 1))))’ from ‘Eigen::internal::unpacket_traits<vector(2) long long int>::type’ {aka ‘vector(2) long long int’} to ‘Eigen::QInt32’
return pfirst(
Provide the exact sequence of commands / steps that you executed before running into the problem (base) $ yes ""| ./configure WARNING: Output base './.cache/bazel/_bazel_chunxie/b3a1696304cedbe15049d6664790de6a' is on NFS. This may lead to surprising failures and undetermined behavior. WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.25.3 installed. Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths: /usr/lib64/python3.6/site-packages /usr/local/lib64/python3.6/site-packages /usr/local/lib/python3.6/site-packages /usr/lib/python3.6/site-packages Please input the desired Python library path to use. Default is [/usr/lib64/python3.6/site-packages] Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow.
Found CUDA 11.5 in: /usr/local/cuda/lib64 /usr/local/cuda/include Found cuDNN 8 in: /usr/lib64 /usr/include Found TensorRT 8 in: /usr/lib64 /usr/include Found NCCL 2 in: /usr/lib64 /usr/include Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow.
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=numa # Build with NUMA support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. --config=v2 # Build TensorFlow 2.x instead of 1.x. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apache Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished