ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
684 stars 93 forks source link

Compiling fails due to other GPU #2504

Open Gotbread opened 5 months ago

Gotbread commented 5 months ago

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04 LTS

Mobile device

No response

Python version

3.10

Bazel version

6.1.0

GCC/compiler version

11.4

CUDA/cuDNN version

No response

GPU model and memory

gfx1100 & gfx1036

Current behavior?

I am trying to compile from source, but fails as it tries to compile for gfx1036 too. I dont want that, i just want the gfx1100 version, but i am unable to disable this.

I tried to disable the iGPU in the bios, but it still shows up to rocminfo and apparently also to the compilation process.

Is there a way to force it to not try to compile for other gfx versions? i just want the gfx1100 version.

I also cant use the prebuild binary, since it contains the "gfx1030gfx1100" bug string, which causes tf to ignore my gpu.

I need a way to disable the iGPU so that rocm does not see it anymore. this issue is similar to #2292 but i cant find a way to skip this gpu.

Standalone code to reproduce the issue

compile the latest version from source, with a gfx1036 on the system.

Relevant log output

INFO: Found applicable config definition build:dynamic_kernels in file /home/user/custom_tf/tensorflow-upstream/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: The following configs were expanded more than once: [rocm, rocm_base, no_tfrt, release_cpu_linux_base]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Analyzed target //tensorflow/tools/pip_package:wheel (710 packages loaded, 50788 targets configured).
INFO: Found 1 target...
ERROR: /home/user/.cache/bazel/_bazel_user/8ff3c252cf6943b0e4c6e47a965a8647/external/local_xla/xla/service/gpu/BUILD:1412:23: Compiling xla/service/gpu/cub_sort_kernel.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @local_xla//xla/service/gpu:cub_sort_kernel_f64) 
  (cd /home/user/.cache/bazel/_bazel_user/8ff3c252cf6943b0e4c6e47a965a8647/execroot/org_tensorflow && \
  exec env - \
    CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang \
    PATH=/home/user/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
    ROCM_PATH=/opt/rocm-6.0.2 \
    TF2_BEHAVIOR=1 \
    TF_ROCM_CLANG=1 \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++14' -MD -MF bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.o' -fPIC '-DEIGEN_MAX_ALIGN_BYTES=64' -DEIGEN_ALLOW_UNALIGNED_SCALARS '-DEIGEN_USE_AVX512_GEMM_KERNELS=0' '-DTENSORFLOW_USE_ROCM=1' -DCUB_TYPE_F64 '-DBAZEL_CURRENT_REPOSITORY="local_xla"' -iquote external/local_xla -iquote bazel-out/k8-opt/bin/external/local_xla -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/local_tsl -iquote bazel-out/k8-opt/bin/external/local_tsl -iquote external/local_config_rocm -iquote bazel-out/k8-opt/bin/external/local_config_rocm -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/eigen_archive/mkl_include -isystem bazel-out/k8-opt/bin/external/eigen_archive/mkl_include -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include/hipcub -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/hipcub -isystem external/local_config_rocm/rocm/rocm/include/rocprim -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocprim -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -Wno-all -Wno-extra -Wno-deprecated -Wno-deprecated-declarations -Wno-ignored-attributes -Wno-array-bounds -Wunused-result '-Werror=unused-result' -Wswitch '-Werror=switch' '-Wno-error=unused-but-set-variable' -DAUTOLOAD_DYNAMIC_KERNELS -Wno-gnu-offsetof-extensions -Wno-unused-result -Wno-sign-compare -Wno-gnu-offsetof-extensions -Wno-unused-result '-std=c++17' -x rocm '--amdgpu-target=gfx1100' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -no-canonical-prefixes -fno-canonical-system-headers -c external/local_xla/xla/service/gpu/cub_sort_kernel.cu.cc -o bazel-out/k8-opt/bin/external/local_xla/xla/service/gpu/_objs/cub_sort_kernel_f64/cub_sort_kernel.cu.pic.o)
# Configuration: e4ece56677a12dcf02a4cc8466fa0e1a29e7ca5c7dc9c8d9b2f8ab0324debfef
# Execution platform: @local_execution_config_platform//:platform
clang: warning: argument unused during compilation: '-fgpu-flush-denormals-to-zero' [-Wunused-command-line-argument]
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr43 = V_MOV_B32_dpp undef $vgpr43(tied-def 0), $vgpr4, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr4 = V_MOV_B32_dpp undef $vgpr4(tied-def 0), killed $vgpr3, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr2, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr43 = V_MOV_B32_dpp undef $vgpr43(tied-def 0), $vgpr4, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr48 = V_MOV_B32_dpp undef $vgpr48(tied-def 0), $vgpr45, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr45 = V_MOV_B32_dpp undef $vgpr45(tied-def 0), $vgpr44, 322, 15, 15, 0, implicit $exec
12 errors generated when compiling for gfx1036.
Target //tensorflow/tools/pip_package:wheel failed to build
INFO: Elapsed time: 177.097s, Critical Path: 81.30s
INFO: 6265 processes: 1446 internal, 4819 local.
FAILED: Build did NOT complete successfully