XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed"

Awlexus commented 6 months ago

Hi, I've been trying to get GPU support running, but I keep running into this issue. I was first looking at this issue to get it running. I added the dependencies like this:

# mix.exs
      {:nx, github: "elixir-nx/nx", sparse: "nx", override: true},
      {:exla, github: "elixir-nx/nx", sparse: "exla", override: true}

I made sure to install the dependencies mentioned in this comment (adjusted for arch linux):

$ sudo pacman -S miopen-hip hipfft rocrand \
    hipsparse  hipsolver hipsparse rccl hip-runtime-amd \
    rocfft roctracer hipblas rocm-device-libs rocsolver rocblas

And then tried to compile it with $ XLA_BUILD=true XLA_TARGET=rocm mix compile

Compilation logs



==> xla
Compiling 2 files (.ex)
Generated xla app
mkdir -p /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && \
    cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && \
    git init && \
    git remote add origin https://github.com/openxla/xla.git && \
    git fetch --depth 1 origin 771e38178340cbaaef8ff20f44da5407c15092cb && \
    git checkout FETCH_HEAD && \
    rm /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelversion
Initialized empty Git repository in /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.git/
From https://github.com/openxla/xla
 * branch            771e38178340cbaaef8ff20f44da5407c15092cb -> FETCH_HEAD
Note: switching to 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c 

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 771e381 [XLA:GPU] Check tensor_float_32_execution_enabled() in Triton codegen too
rm -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension && \
    ln -s "/hdd/programming/elixir/fusemega/deps/xla/extension" /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension && \
    cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && \
    bazel build --define "framework_shared_object=false" -c opt   --config=rocm --action_env=HIP_PLATFORM=hcc --action_env=TF_ROCM_AMDGPU_TARGETS="gfx900,gfx906,gfx908,gfx90a,gfx1030" //xla/extension:xla_extension && \
    mkdir -p /home/awlex/.cache/xla/0.6.0/cache/build/ && \
    cp -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/bazel-bin/xla/extension/xla_extension.tar.gz /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Found applicable config definition build:short_logs in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --config=no_tfrt
INFO: Found applicable config definition build:no_tfrt in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ifrt,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils
INFO: Found applicable config definition build:linux in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading:
Loading: 0 packages loaded
DEBUG: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'llvm-raw' because it already exists.
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: xla/extension
Analyzing: target //xla/extension:xla_extension (1 packages loaded, 0 targets configured)
Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured)
Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured)
Analyzing: target //xla/extension:xla_extension (179 packages loaded, 13765 targets configured)
INFO: Analyzed target //xla/extension:xla_extension (182 packages loaded, 16076 targets configured).
INFO: Found 1 target...
[0 / 196] [Prepa] Writing script xla/extension/xla_extension_headers.genrule_script.sh
[44 / 4,961] Compiling src/google/protobuf/compiler/cpp/file.cc; 3s local ... (12 actions, 11 running)
[71 / 4,961] Compiling src/google/protobuf/compiler/cpp/field.cc; 1s local ... (12 actions, 11 running)
[97 / 4,961] Compiling src/google/protobuf/compiler/command_line_interface.cc; 4s local ... (12 actions, 11 running)
[133 / 4,961] Compiling src/google/protobuf/util/internal/protostream_objectsource.cc; 2s local ... (12 actions, 11 running)
[173 / 4,961] Compiling src/google/protobuf/descriptor.cc; 8s local ... (12 actions, 11 running)
[256 / 5,172] Compiling llvm/lib/TableGen/TGParser.cpp [for host]; 3s local ... (12 actions, 11 running)
[339 / 5,172] Compiling llvm/lib/Support/KnownBits.cpp [for host]; 2s local ... (12 actions, 11 running)
[448 / 5,400] Compiling llvm/lib/Support/VirtualFileSystem.cpp; 4s local ... (12 actions, 11 running)
[542 / 5,400] Compiling llvm/lib/Support/Caching.cpp; 1s local ... (12 actions, 11 running)
[897 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDocGen.cpp; 4s local ... (12 actions, 11 running)
[958 / 6,732] Compiling llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp [for host]; 12s local ... (12 actions, 11 running)
[1,130 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp [for host]; 10s local ... (12 actions, 11 running)
[1,362 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_register_bank_genrule; 22s local ... (12 actions, 11 running)
[1,813 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_asm_matcher_genrule; 16s local ... (12 actions, 11 running)
[2,160 / 6,907] Compiling xla/hlo/utils/hlo_sharding_util.cc; 5s local ... (12 actions running)
[2,285 / 6,907] Compiling xla/service/hlo_rematerialization.cc; 12s local ... (12 actions running)
[2,446 / 6,907] Compiling llvm/lib/IR/AutoUpgrade.cpp; 7s local ... (12 actions running)
[2,624 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 14s local ... (12 actions, 11 running)
[2,758 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 12s local ... (12 actions, 11 running)
[2,942 / 6,907] Compiling src/cpu/x64/gemm/f32/jit_avx2_f32_copy_an_kern_autogen.cpp; 8s local ... (12 actions, 11 running)
[3,150 / 6,907] Compiling src/cpu/x64/jit_uni_resampling_kernel.cpp; 13s local ... (12 actions, 11 running)
[3,366 / 6,907] Compiling src/cpu/x64/jit_brgemm_conv.cpp; 45s local ... (12 actions, 11 running)
[3,642 / 6,908] Compiling llvm/lib/Passes/PassBuilder.cpp; 45s local ... (12 actions, 11 running)
[3,929 / 6,908] Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp; 56s local ... (12 actions, 11 running)
[4,340 / 6,908] Compiling stablehlo/dialect/StablehloOps.cpp; 41s local ... (12 actions, 11 running)
[4,628 / 6,908] Compiling mlir/lib/Dialect/SPIRV/IR/SPIRVOpDefinition.cpp; 33s local ... (12 actions, 11 running)
[4,938 / 6,908] Compiling llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp; 13s local ... (12 actions, 11 running)
[5,317 / 6,908] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 22s local ... (12 actions, 11 running)
[5,761 / 6,908] Compiling mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp; 33s local ... (12 actions, 11 running)
[6,253 / 6,908] Compiling xla/mlir_hlo/mhlo/IR/hlo_ops.cc; 67s local ... (12 actions, 11 running)
[6,854 / 6,909] Compiling xla/service/gpu/runtime/fused_attention.cc; 64s local ... (12 actions, 11 running)
ERROR: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/service/gpu/BUILD:257:11: Compiling xla/service/gpu/ir_emitter_unnested.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer ... (remaining 356 arguments skipped)
/home/awlex/.cache/bazel/_bazel_awlex/74b6e6c2abb213e1ba59aee5534c65a2/execroot/xla/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
  import pipes
In file included from ./xla/shape_util.h:36,
                 from ./xla/index_util.h:25,
                 from ./xla/literal.h:41,
                 from ./xla/hlo/ir/dfs_hlo_visitor.h:26,
                 from ./xla/hlo/ir/hlo_computation.h:32,
                 from ./xla/service/gpu/ir_emitter_unnested.h:31,
                 from xla/service/gpu/ir_emitter_unnested.cc:16:
external/com_google_absl/absl/log/check.h:57: warning: "CHECK" redefined
   57 | #define CHECK(condition) ABSL_LOG_INTERNAL_CHECK_IMPL((condition), #condition)
      |
In file included from external/tsl/tsl/platform/logging.h:26,
                 from external/tsl/tsl/platform/status.h:34,
                 from ./xla/status.h:19,
                 from ./xla/statusor.h:18,
                 from ./xla/hlo/ir/hlo_opcode.h:24,
                 from ./xla/hlo/ir/dfs_hlo_visitor.h:25:
external/tsl/tsl/platform/default/logging.h:308: note: this is the location of the previous definition
  308 | #define CHECK(condition)              \
      |
external/com_google_absl/absl/log/check.h:65: warning: "QCHECK" redefined
   65 | #define QCHECK(condition) ABSL_LOG_INTERNAL_QCHECK_IMPL((condition), #condition)
      |
external/tsl/tsl/platform/default/logging.h:542: note: this is the location of the previous definition
  542 | #define QCHECK(condition) CHECK(condition)
      |
external/com_google_absl/absl/log/check.h:88: warning: "DCHECK" redefined
   88 | #define DCHECK(condition) ABSL_LOG_INTERNAL_DCHECK_IMPL((condition), #condition)
      |
external/tsl/tsl/platform/default/logging.h:521: note: this is the location of the previous definition
  521 | #define DCHECK(condition) \
      |
external/com_google_absl/absl/log/check.h:116: warning: "CHECK_EQ" redefined
  116 | #define CHECK_EQ(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:499: note: this is the location of the previous definition
  499 | #define CHECK_EQ(val1, val2) CHECK_OP(Check_EQ, ==, val1, val2)
      |
external/com_google_absl/absl/log/check.h:118: warning: "CHECK_NE" redefined
  118 | #define CHECK_NE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:500: note: this is the location of the previous definition
  500 | #define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2)
      |
external/com_google_absl/absl/log/check.h:120: warning: "CHECK_LE" redefined
  120 | #define CHECK_LE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:501: note: this is the location of the previous definition
  501 | #define CHECK_LE(val1, val2) CHECK_OP(Check_LE, <=, val1, val2)
      |
external/com_google_absl/absl/log/check.h:122: warning: "CHECK_LT" redefined
  122 | #define CHECK_LT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:502: note: this is the location of the previous definition
  502 | #define CHECK_LT(val1, val2) CHECK_OP(Check_LT, <, val1, val2)
      |
external/com_google_absl/absl/log/check.h:124: warning: "CHECK_GE" redefined
  124 | #define CHECK_GE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:503: note: this is the location of the previous definition
  503 | #define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
      |
external/com_google_absl/absl/log/check.h:126: warning: "CHECK_GT" redefined
  126 | #define CHECK_GT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:504: note: this is the location of the previous definition
  504 | #define CHECK_GT(val1, val2) CHECK_OP(Check_GT, >, val1, val2)
      |
external/com_google_absl/absl/log/check.h:128: warning: "QCHECK_EQ" redefined
  128 | #define QCHECK_EQ(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:543: note: this is the location of the previous definition
  543 | #define QCHECK_EQ(x, y) CHECK_EQ(x, y)
      |
external/com_google_absl/absl/log/check.h:130: warning: "QCHECK_NE" redefined
  130 | #define QCHECK_NE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:544: note: this is the location of the previous definition
  544 | #define QCHECK_NE(x, y) CHECK_NE(x, y)
      |
external/com_google_absl/absl/log/check.h:132: warning: "QCHECK_LE" redefined
  132 | #define QCHECK_LE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:545: note: this is the location of the previous definition
  545 | #define QCHECK_LE(x, y) CHECK_LE(x, y)
      |
external/com_google_absl/absl/log/check.h:134: warning: "QCHECK_LT" redefined
  134 | #define QCHECK_LT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:546: note: this is the location of the previous definition
  546 | #define QCHECK_LT(x, y) CHECK_LT(x, y)
      |
external/com_google_absl/absl/log/check.h:136: warning: "QCHECK_GE" redefined
  136 | #define QCHECK_GE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:547: note: this is the location of the previous definition
  547 | #define QCHECK_GE(x, y) CHECK_GE(x, y)
      |
external/com_google_absl/absl/log/check.h:138: warning: "QCHECK_GT" redefined
  138 | #define QCHECK_GT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:548: note: this is the location of the previous definition
  548 | #define QCHECK_GT(x, y) CHECK_GT(x, y)
      |
external/com_google_absl/absl/log/check.h:140: warning: "DCHECK_EQ" redefined
  140 | #define DCHECK_EQ(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:531: note: this is the location of the previous definition
  531 | #define DCHECK_EQ(x, y) _TF_DCHECK_NOP(x, y)
      |
external/com_google_absl/absl/log/check.h:142: warning: "DCHECK_NE" redefined
  142 | #define DCHECK_NE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:532: note: this is the location of the previous definition
  532 | #define DCHECK_NE(x, y) _TF_DCHECK_NOP(x, y)
      |
external/com_google_absl/absl/log/check.h:144: warning: "DCHECK_LE" redefined
  144 | #define DCHECK_LE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:533: note: this is the location of the previous definition
  533 | #define DCHECK_LE(x, y) _TF_DCHECK_NOP(x, y)
      |
external/com_google_absl/absl/log/check.h:146: warning: "DCHECK_LT" redefined
  146 | #define DCHECK_LT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:534: note: this is the location of the previous definition
  534 | #define DCHECK_LT(x, y) _TF_DCHECK_NOP(x, y)
      |
external/com_google_absl/absl/log/check.h:148: warning: "DCHECK_GE" redefined
  148 | #define DCHECK_GE(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:535: note: this is the location of the previous definition
  535 | #define DCHECK_GE(x, y) _TF_DCHECK_NOP(x, y)
      |
external/com_google_absl/absl/log/check.h:150: warning: "DCHECK_GT" redefined
  150 | #define DCHECK_GT(val1, val2) \
      |
external/tsl/tsl/platform/default/logging.h:536: note: this is the location of the previous definition
  536 | #define DCHECK_GT(x, y) _TF_DCHECK_NOP(x, y)
      |
xla/service/gpu/ir_emitter_unnested.cc: In member function ‘tsl::Status xla::gpu::IrEmitterUnnested::EmitCubDeviceRadixSort(mlir::Operation*)’:
xla/service/gpu/ir_emitter_unnested.cc:1472:33: error: ‘CubSortThunk’ was not declared in this scope
 1472 |   auto thunk = std::make_unique(
      |                                 ^~~~~~~~~~~~
xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: no matching function for call to ‘make_unique< >(xla::gpu::Thunk::ThunkInfo, xla::PrimitiveType, std::optional, std::vector&, std::vector&, xla::BufferAllocation::Slice&, bool)’
 1472 |   auto thunk = std::make_unique(
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1474 |       GetShape(op->getOperand(0)).element_type(),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1475 |       radix_sort_op.getInputs().size() == 2
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1477 |           : std::nullopt,
      |           ~~~~~~~~~~~~~~~
 1478 |       operands, results, scratch, radix_sort_op.getDescending());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/memory:78,
                 from ./xla/service/gpu/ir_emitter_unnested.h:21:
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note: candidate: ‘template std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...)’
 1069 |     make_unique(_Args&&... __args)
      |     ^~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note:   template argument deduction/substitution failed:
xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid
 1472 |   auto thunk = std::make_unique(
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1474 |       GetShape(op->getOperand(0)).element_type(),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1475 |       radix_sort_op.getInputs().size() == 2
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1477 |           : std::nullopt,
      |           ~~~~~~~~~~~~~~~
 1478 |       operands, results, scratch, radix_sort_op.getDescending());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note: candidate: ‘template std::__detail::__unique_ptr_array_t<_Tp> std::make_unique(size_t)’
 1084 |     make_unique(size_t __num)
      |     ^~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note:   candidate expects 1 argument, 7 provided
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note: candidate: ‘template std::__detail::__invalid_make_unique_t<_Tp> std::make_unique(_Args&& ...)’ (deleted)
 1094 |     make_unique(_Args&&...) = delete;
      |     ^~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note:   template argument deduction/substitution failed:
xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid
 1472 |   auto thunk = std::make_unique(
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1474 |       GetShape(op->getOperand(0)).element_type(),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1475 |       radix_sort_op.getInputs().size() == 2
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 1477 |           : std::nullopt,
      |           ~~~~~~~~~~~~~~~
 1478 |       operands, results, scratch, radix_sort_op.getDescending());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/unistd.h:226,
                 from external/com_google_absl/absl/base/internal/thread_identity.h:27,
                 from external/com_google_absl/absl/synchronization/mutex.h:70,
                 from external/com_google_absl/absl/strings/internal/cordz_info.h:31,
                 from external/com_google_absl/absl/strings/cord.h:91,
                 from external/com_google_absl/absl/container/internal/hash_function_defaults.h:56,
                 from external/com_google_absl/absl/container/flat_hash_map.h:41,
                 from ./xla/hlo/ir/hlo_computation.h:26:
external/tsl/tsl/concurrency/async_value.h: In instantiation of ‘static void tsl::internal::ConcreteAsyncValue::VerifyOffsets() [with T = tsl::DummyValueForErrorAsyncValue]’:
external/tsl/tsl/concurrency/async_value.h:536:18:   required from ‘tsl::internal::ConcreteAsyncValue::ConcreteAsyncValue(absl::lts_20230802::Status) [with T = tsl::DummyValueForErrorAsyncValue]’
external/tsl/tsl/concurrency/async_value.h:727:30:   required from here
external/tsl/tsl/concurrency/async_value.h:702:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValue’ is conditionally-supported [-Winvalid-offsetof]
  702 |     static_assert(offsetof(ConcreteAsyncValue, data_store_.data_) ==
      |                            ^
external/tsl/tsl/concurrency/async_value.h:706:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValue’ is conditionally-supported [-Winvalid-offsetof]
  706 |     static_assert(offsetof(ConcreteAsyncValue, data_store_.error_) ==
      |                            ^
Target //xla/extension:xla_extension failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2466.368s, Critical Path: 158.87s
INFO: 6899 processes: 469 internal, 6430 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
make: *** [Makefile:26: /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz] Error 1
could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla --force", update it with "mix deps.update xla" or clean it with "mix deps.clean xla"
==> fusemega
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".

jonatanklosko commented 6 months ago

Hey @Awlexus, this could be an issue with the build environment. To be sure, you can alternatively use the Docker scripts (./build.sh rocm), then use XLA_ARCHIVE_URL=file:///path/to/build.tzr.gz accordingly.

In case your GPU uses gfx1100 (7900 XTX), you may need to use a more recent XLA revision as per https://github.com/elixir-nx/xla/issues/63#issuecomment-1844195261 (either by setting OPENXLA_GIT_REV with mix compile or changing the Makefile directly in case of the Docker build).

Awlexus commented 6 months ago

Thanks @jonatanklosko, I was able to compile it by using a a more recently xla git ref, but I could not get it to start GPU. I tried again by using the docker script to build it (which took a long time) and experienced the same error. It was able to allocate the memory, but the program would soon after be stopped by the operating system. Not sure where exactly this error comes from.

Error log


2023-12-28 23:43:05.394087: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fa4c018dc30 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.
fish: Job 1, 'iex -S mix phx.server $argv' terminated by signal SIGSEGV (Address boundary error)

jonatanklosko commented 6 months ago

Hmm, do you do any Nx stuff on boot? Does the error happen every time? I assume it doesn't happen if you use CPU only? You can also try ELIXIR_ERL_OPTIONS="+sssdio 128 +sssdcpu 128", though it rather helps with segfaults.

Awlexus commented 6 months ago

Sorry for the late reply, I was away for a bit.

I'm not sure what changed since then, but now I'm getting a different error message. I already tried to write out a reply, before I noticed the change, so I added it at the end in case it could be helpful.

I now ran into the error message (RuntimeError) bitcode module not found at ./opencl.bc, which I was able to resolve by setting ROCM_PATH=/opt/rocm (Mentioning this in case someone else runs into this)

Now I'm running into the following error that soon afterwards causes the OS to send a SIGABRT

2023-12-31 18:56:44.607676: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fe7ac1707a0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.

...

beam.smp: /usr/src/debug/hip-runtime-amd/clr-rocm-5.7.1/hipamd/src/hip_code_object.cpp:762: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.

Old Reply

> do you do any Nx stuff on boot? I've added a serving of openai/whisper to my application's supervision tree, but that should be all ```elixir {:ok, model_info} = Bumblebee.load_model({:hf, @whisper_model}) {:ok, featurizer} = Bumblebee.load_featurizer({:hf, @whisper_model}) {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, @whisper_model}) {:ok, generation_config} = Bumblebee.load_generation_config({:hf, @whisper_model}) generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100) serving = Bumblebee.Audio.speech_to_text_whisper( model_info, featurizer, tokenizer, generation_config, compile: [batch_size: 4], chunk_num_seconds: 30, stream: true, defn_options: [compiler: EXLA] ) ``` > Does the error happen every time? I assume it doesn't happen if you use CPU only? Yes, it happens every time, before the serving is able to complete a single run

jonatanklosko commented 6 months ago

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version like /opt/rocm-5.7.1, let's set ROCM_PATH to that just to be sure. Otherwise maybe there's a certain ROCM HIP package missing in the environment?

Awlexus commented 5 months ago

I'm running Arch Linux and rely on the packages provided there, so I'm not sure what I could be missing. I have installed every package that pops up when I search for rocm, but just to be sure I've provided a list of the installed packages below.

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version

/opt/rocm really just links to the packages installed on my system.

$ ls -lah /opt
drwxr-xr-x 34 root root 4.0K Dec 31 18:53  rocm/

Installed rocm packages

* comgr 5.7.1-1 Compiler support library for ROCm LLVM * hip-runtime-amd 5.7.1-1 Heterogeneous Interface for Portability ROCm * hipblas 5.7.1-1 ROCm BLAS marshalling library * hsa-rocr 5.7.1-1 HSA Runtime API and runtime for ROCm * magma-hip 2.7.2-2 Matrix Algebra on GPU and Multicore Architectures (with ROCm/HIP) * python-pytorch-rocm 2.1.2-1 Tensors and Dynamic neural networks in Python with strong GPU acceleration (with ROCm) * python-torchvision-rocm 0.16.2-1 Datasets, transforms, and models specific to computer vision (with ROCM support) * rccl 5.7.1-1 ROCm Communication Collectives Library * rocalution 5.7.1-1 Next generation library for iterative sparse solvers for ROCm platform * rocblas 5.7.1-1 Next generation BLAS implementation for ROCm platform * rocfft 5.7.1-1 Next generation FFT implementation for ROCm * rocm-clang-ocl 5.7.1-1 OpenCL compilation with clang compiler * rocm-cmake 5.7.1-1 CMake modules for common build tasks needed for the ROCm software stack * rocm-core 5.7.1-1 AMD ROCm core package (version files) * rocm-device-libs 5.7.1-1 ROCm Device Libraries * rocm-hip-libraries 5.7.1-2 Develop certain applications using HIP and libraries for AMD platforms * rocm-hip-runtime 5.7.1-2 Packages to run HIP applications on the AMD platform * rocm-hip-sdk 5.7.1-2 Develop applications using HIP and libraries for AMD platforms * rocm-language-runtime 5.7.1-2 ROCm runtime * rocm-llvm 5.7.1-1 Radeon Open Compute - LLVM toolchain (llvm, clang, lld) * rocm-ml-libraries 5.7.1-2 Packages for key Machine Learning libraries * rocm-ml-sdk 5.7.1-2 develop and run Machine Learning applications optimized for AMD platforms * rocm-opencl-runtime 5.7.1-1 OpenCL implementation for AMD * rocm-opencl-sdk 5.7.1-2 Develop OpenCL-based applications for AMD platforms * rocm-smi-lib 5.7.1-1 ROCm System Management Interface Library * rocminfo 5.7.1-1 ROCm Application for Reporting System Info * rocrand 5.7.1-1 Pseudo-random and quasi-random number generator on ROCm * rocsolver 5.7.1-1 Subset of LAPACK functionality on the ROCm platform * rocsparse 5.7.1-1 BLAS for sparse computation on top of ROCm * rocthrust 5.7.1-1 Port of the Thrust parallel algorithm library atop HIP/ROCm * roctracer 5.7.1-1 ROCm tracer library for performance tracing

jonatanklosko commented 5 months ago

I see. It must be something environment related, given that others managed to run it with that revision, but I don't have any more guesses right now.

One alternative would be running stuff inside Docker, though that's not exactly convenient. Or you could try building with the latest openxla revision to see if it's something fixed upstream, but note that this usually requires some adjustments in the build file or/and in exla (depending on how much the xla APIs changed).

jonatanklosko commented 1 month ago

We just had a new release, see https://github.com/elixir-nx/xla/issues/82#issuecomment-2124230058. You can try it with ROCm 6.0, and if there are issues, leave a comment on #82 :)

elixir-nx / xla

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68