Closed jjkeijser closed 3 years ago
System information
I am trying to build tensorflow-rocm on CentOS 7 with rocm 4.1; the code builds and runs with rocm 4.0.1 but with 4.1 I get on multiple hosts
ERROR: /tmp/janjust/tensorflow-rocm/tensorflow/core/kernels/mlir_generated/BUILD:746:23: compile tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o failed (Exit 1): tf_to_kernel failed: error executing command (cd /tmp/janjust/bazel/_bazel_janjust/a17baf96ffee6431b0a557b510a7c432/execroot/org_tensorflow && \ exec env - \ bazel-out/host/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--unroll_factors=4' '--tile_sizes=1024' '--arch=gfx906,gfx906' '--input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64.mlir' '--output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o' '--enable_ftz=False' '--cpu_codegen=False') Execution platform: @local_execution_config_platform//:platform 2021-05-07 17:21:44.587700: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2021-05-07 17:21:44.744495: W tensorflow/compiler/mlir/tools/kernel_gen/kernel_creator.cc:348] There should be exactly one GPU Module, but got 7. Currently we leak memory if there is more than one module, see https://bugs.llvm.org/show_bug.cgi?id=48385 error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed error: clang_offload_bundler exited with non-zero error code 256, output: /opt/rocm-4.1.0/llvm/bin/clang-offload-bundler: error: Duplicate targets are not allowed 2021-05-07 17:21:46.879506: E tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel.cc:183] Internal: Generating device code failed. Target //tensorflow/tools/pip_package:build_pip_package failed to build ERROR: /tmp/janjust/tensorflow-rocm/tensorflow/lite/toco/python/BUILD:89:10 compile tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o failed (Exit 1): tf_to_kernel failed: error executing command (cd /tmp/janjust/bazel/_bazel_janjust/a17baf96ffee6431b0a557b510a7c432/execroot/org_tensorflow && \ exec env - \ bazel-out/host/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--unroll_factors=4' '--tile_sizes=1024' '--arch=gfx906,gfx906' '--input=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64.mlir' '--output=bazel-out/k8-opt/bin/tensorflow/core/kernels/mlir_generated/sub_gpu_f64_f64_kernel_generator_kernel.o' '--enable_ftz=False' '--cpu_codegen=False') Execution platform: @local_execution_config_platform//:platform INFO: Elapsed time: 1889.625s, Critical Path: 255.91s INFO: 23155 processes: 1559 internal, 21596 local. FAILED: Build did NOT complete successfully
How can I work around this issue?
Hi @jjkeijser , Tensorflow doesn't have native support for CentOS distros. Please note you can deploy the public tensorflow-rocm docker images on CentOS hosts: https://hub.docker.com/r/rocm/tensorflow
System information
I am trying to build tensorflow-rocm on CentOS 7 with rocm 4.1; the code builds and runs with rocm 4.0.1 but with 4.1 I get on multiple hosts
How can I work around this issue?