TensorFlow installed from: NA (build problem, not installation problem)
TensorFlow version: 1.15.5
Python version: 3.8.5
Installed using virtualenv? pip? conda?: NA
Bazel version (if compiling from source): 0.24.1
GCC/Compiler version (if compiling from source): 9.3.0
CUDA/cuDNN version: ?? (whatever is in the docker image)
GPU model and memory: 1080Ti
Describe the problem
From within a docker container running nvcr.io/nvidia/tensorflow:21.05-tf1-py3, tfcompile fails to build:
$ root@48f2340d016b:/opt/tensorflow/tensorflow-source# bazel build --config=opt --config=cuda //tensorflow/compiler/aot:tfcompile
...
INFO: Analysed target //tensorflow/compiler/aot:tfcompile (124 packages loaded, 11185 targets configured).
INFO: Found 1 target...
ERROR: /opt/tensorflow/tensorflow-source/tensorflow/compiler/aot/BUILD:190:1: C++ compilation of rule '//tensorflow/compiler/aot:embedded_protocol_buffers' failed (Exit 1)
tensorflow/compiler/aot/embedded_protocol_buffers.cc: In function ‘xla::StatusOr<std::__cxx11::basic_string<char> > tensorflow::tfcompile::CodegenModule(llvm::TargetMachine*, std::unique_ptr<llvm::Module>)’:
tensorflow/compiler/aot/embedded_protocol_buffers.cc:85:32: error: ‘CGFT_ObjectFile’ is not a member of ‘llvm::TargetMachine’
85 | llvm::TargetMachine::CGFT_ObjectFile)) {
| ^~~~~~~~~~~~~~~
Target //tensorflow/compiler/aot:tfcompile failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 14.014s, Critical Path: 9.96s
INFO: 171 processes: 171 local.
FAILED: Build did NOT complete successfully
I have modified the .tf_configure.bazelrc but I think the changes are irrelevant to the failure:
build --action_env PYTHON_BIN_PATH="/usr/bin/python3.8"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.8/dist-packages"
build --python_path="/usr/bin/python3.8"
build:xla --define with_xla_support=true
build --config=xla
build --action_env TF_USE_CCACHE="0"
build --copt=-march=haswell
build:opt --define with_default_optimizations=true
build:v2 --define=tf_api_version=2
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-gpu
test --build_tag_filters=-gpu
build --action_env TF_CONFIGURE_IOS="0"
I believe nvidia-tensorflow has some llvm-related changes wrt to upstream but maybe the focus was on getting Python tensorflow working with Nvidia hardware without attention to less commonly-used parts like tfcompile. This failure looks like llvm code not being right for the llvm version. Upstream 1.15.5 builds with no problems.
System information
Describe the problem From within a docker container running nvcr.io/nvidia/tensorflow:21.05-tf1-py3, tfcompile fails to build:
I have modified the .tf_configure.bazelrc but I think the changes are irrelevant to the failure:
I believe nvidia-tensorflow has some llvm-related changes wrt to upstream but maybe the focus was on getting Python tensorflow working with Nvidia hardware without attention to less commonly-used parts like tfcompile. This failure looks like llvm code not being right for the llvm version. Upstream 1.15.5 builds with no problems.