ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
688 stars 94 forks source link

Building tensorflow-rocm on centos 7 with rocm 4.1 fails #1348

Closed jjkeijser closed 3 years ago

jjkeijser commented 3 years ago

System information

I am trying to build tensorflow-rocm on CentOS 7 with rocm 4.1; the code builds and runs with rocm 4.0.1 but with 4.1 I get on multiple hosts

ERROR: /tmp/janjust/tensorflow-rocm/tensorflow/core/kernels/mlir_generated/BUILD:746:23: compile tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o failed (Exit 1): tf_to_kernel failed: error executing command
  (cd /tmp/janjust/bazel/_bazel_janjust/a17baf96ffee6431b0a557b510a7c432/execroot/org_tensorflow && \
  exec env - \
  bazel-out/host/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--unroll_factors=4' '--tile_sizes=1024' '--arch=gfx906,gfx906' '--input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64.mlir' '--output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o' '--enable_ftz=False' '--cpu_codegen=False')
Execution platform: @local_execution_config_platform//:platform
2021-05-07 17:21:44.587700: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2021-05-07 17:21:44.744495: W tensorflow/compiler/mlir/tools/kernel_gen/kernel_creator.cc:348] There should be exactly one GPU Module, but got 7. Currently we leak memory if there is more than one module, see https://bugs.llvm.org/show_bug.cgi?id=48385
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed
2021-05-07 17:21:46.879506: E tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel.cc:183] Internal: Generating device code failed.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
ERROR: /tmp/janjust/tensorflow-rocm/tensorflow/lite/toco/python/BUILD:89:10 compile tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o failed (Exit 1): tf_to_kernel failed: error executing command
  (cd /tmp/janjust/bazel/_bazel_janjust/a17baf96ffee6431b0a557b510a7c432/execroot/org_tensorflow && \
  exec env - \
  bazel-out/host/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--unroll_factors=4' '--tile_sizes=1024' '--arch=gfx906,gfx906' '--input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64.mlir' '--output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o' '--enable_ftz=False' '--cpu_codegen=False')
Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 1889.625s, Critical Path: 255.91s
INFO: 23155 processes: 1559 internal, 21596 local.
FAILED: Build did NOT complete successfully

How can I work around this issue?

sunway513 commented 3 years ago

Hi @jjkeijser , Tensorflow doesn't have native support for CentOS distros. Please note you can deploy the public tensorflow-rocm docker images on CentOS hosts: https://hub.docker.com/r/rocm/tensorflow