ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
49 stars 80 forks source link

Any plans for adding gfx10+ support? #648

Open anonymux1 opened 7 months ago

anonymux1 commented 7 months ago

I cannot use rx6xxx cards anymore for LLM fine tuning with the new hipblaslt requirement. Are there any plans to add support in the future?

TheTrustedComputer commented 1 month ago

I second this feature request, as I have an 8GB 5500 XT for machine learning applications. PyTorch recently added hipBLASLt as a hard requirement to build from version 2.3+ if someone has ROCm 5.7+.

Unless the PyTorch developers make it optional (issue currently ongoing), I and other users will be forced to downgrade to 2.2.2, the latest release that doesn't have this prerequisite.

TheTrustedComputer commented 1 month ago

Here's my attempt to force the compilation of hipBLASLt for my 5500 XT, which uses the gfx1012 architecture. It seemed to hit a brick wall when creating the ExtOp libraries. Unless someone in the community is knowledgeable about how AMD GPUs work at the hardware level and provides unofficial patches for the gfx101x/gfx103x arches, I doubt they'll include them for the foreseeable future.

-- The CXX compiler identification is Clang 18.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.46.0")
-- Using hip-clang to build for amdgpu backend

*******************************************************************************
*------------------------------- ROCMChecks WARNING --------------------------*
  Options and properties should be set on a cmake target where possible. The
  variable 'CMAKE_CXX_FLAGS' may be set by the cmake toolchain, either by
  calling 'cmake -DCMAKE_CXX_FLAGS="-march=native -s -Wno-error=unused-command-line-argument -D__HIP_HCC_COMPAT_MODE__=1"'
  or set in a toolchain file and added with
  'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>'. ROCMChecks now calling:
CMake Warning at /opt/rocm/share/rocmcmakebuildtools/cmake/ROCMChecks.cmake:46 (message):
  'CMAKE_CXX_FLAGS' is set at /root/rocm-6.2/hipBLASLt/CMakeLists.txt:<line#>
  shown below:
Call Stack (most recent call first):
  CMakeLists.txt:9223372036854775807 (rocm_check_toolchain_var)
  CMakeLists.txt:112 (set)

*-----------------------------------------------------------------------------*
*******************************************************************************

-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success
-- AMDGPU_TARGETS: gfx1012
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
/usr/bin/python3 -m venv /root/rocm-build/hipblaslt/virtualenv --system-site-packages --clear
Requirement already satisfied: pip in /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages (24.0)
Collecting pip
  Using cached pip-24.2-py3-none-any.whl.metadata (3.6 kB)
Using cached pip-24.2-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-24.2
/root/rocm-build/hipblaslt/virtualenv/bin/python3 -m pip install /root/rocm-6.2/hipBLASLt/tensilelite
Processing /root/rocm-6.2/hipBLASLt/tensilelite
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pyyaml in /usr/lib/python3.12/site-packages (from Tensile==4.33.0) (6.0.1)
Collecting msgpack (from Tensile==4.33.0)
  Using cached msgpack-1.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Requirement already satisfied: joblib in /usr/lib/python3.12/site-packages (from Tensile==4.33.0) (1.3.2)
Collecting simplejson (from Tensile==4.33.0)
  Using cached simplejson-3.19.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.1 kB)
Collecting ujson (from Tensile==4.33.0)
  Using cached ujson-5.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting orjson (from Tensile==4.33.0)
  Using cached orjson-3.10.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (50 kB)
Using cached msgpack-1.0.8-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (408 kB)
Using cached orjson-3.10.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
Using cached simplejson-3.19.2-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (152 kB)
Using cached ujson-5.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)
Building wheels for collected packages: Tensile
  Building wheel for Tensile (pyproject.toml): started
  Building wheel for Tensile (pyproject.toml): finished with status 'done'
  Created wheel for Tensile: filename=Tensile-4.33.0-py3-none-any.whl size=15863101 sha256=8f108ed7010eef9acb8b2c0c352127128ec76a75ae9b9ca73d40c26d51ffd9ab
  Stored in directory: /tmp/pip-ephem-wheel-cache-9pspn3st/wheels/27/9a/dc/c4a9a335da35cc2e3af28bf8e85378949f5c5049a442f177e6
Successfully built Tensile
Installing collected packages: ujson, simplejson, orjson, msgpack, Tensile
Successfully installed Tensile-4.33.0 msgpack-1.0.8 orjson-3.10.7 simplejson-3.19.2 ujson-5.10.0
-- using local Tensile from /root/rocm-6.2/hipBLASLt/tensilelite, copied to 
-- Adding /root/rocm-build/hipblaslt/virtualenv to CMAKE_PREFIX_PATH
-- The C compiler identification is Clang 18.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Warning (dev) at /usr/share/cmake/Modules/CMakeFindDependencyMacro.cmake:76 (find_package):
  Policy CMP0167 is not set: The FindBoost module is removed.  Run "cmake
  --help-policy CMP0167" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

Call Stack (most recent call first):
  /usr/lib/cmake/msgpack-cxx/msgpack-cxx-config.cmake:40 (find_dependency)
  /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/Source/lib/CMakeLists.txt:105 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found Boost: /usr/lib/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0")
-- Tensile script: /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/bin/TensileCreateLibrary
-- Tensile_CREATE_COMMAND: /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/bin/TensileCreateLibrary;--merge-files;--separate-architectures;--lazy-library-loading;--no-short-file-names;--no-library-print-debug;--code-object-version=default;--cxx-compiler=hipcc;--library-format=msgpack;--architecture=gfx1012;--build-id=sha1;/root/rocm-6.2/hipBLASLt/library/src/amd_detail/rocblaslt/src/Tensile/Logic/asm_full;/root/rocm-build/hipblaslt/Tensile;HIP
-- Tensile_MANIFEST_FILE_PATH: /root/rocm-build/hipblaslt/Tensile/library/TensileManifest.txt
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done (57.6s)
-- Generating done (0.0s)
-- Build files have been written to: /root/rocm-build/hipblaslt
[1/34] Creating ExtOp Libraries
FAILED: Tensile/library/hipblasltExtOpLibrary.dat /root/rocm-build/hipblaslt/Tensile/library/hipblasltExtOpLibrary.dat 
cd /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/Ops && /usr/bin/cmake -E rm -rf /root/rocm-build/hipblaslt/library/build_tmp/ops && /usr/bin/cmake -E make_directory /root/rocm-build/hipblaslt/library/build_tmp/ops && /usr/bin/cmake -E make_directory /root/rocm-build/hipblaslt/Tensile/library && bash /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/Source/..//Ops/gen_assembly.sh "gfx1012" /root/rocm-build/hipblaslt/library/build_tmp/ops /root/rocm-build/hipblaslt/virtualenv sha1 && /usr/bin/cmake -E copy /root/rocm-build/hipblaslt/library/build_tmp/ops/hipblasltExtOpLibrary.dat /root/rocm-build/hipblaslt/library/build_tmp/ops/extop_*.co /root/rocm-build/hipblaslt/Tensile/library
Creating code object for arch gfx1012
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 40 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 40 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 40 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.o'
clang++: error: no input files
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 80 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 80 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 48 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.o'
clang++: error: no input files
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 40 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 40 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 40 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.o'
clang++: error: no input files
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 40 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 40 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 40 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 80 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 80 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 48 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.o'
clang++: error: no input files
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.o'
clang++: error: no input files
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:11:3: error: directive requires gfx90a+
  .amdhsa_accum_offset 40 // accvgpr offset
  ^~~~~~~~~~~~~~~~~~~~
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:12:3: error: unknown directive
  .amdhsa_next_free_vgpr 40 // vgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:13:3: error: unknown directive
  .amdhsa_next_free_sgpr 40 // sgprs
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:14:3: error: unknown directive
  .amdhsa_group_segment_fixed_size 32 // lds bytes
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:15:3: error: unknown directive
  .amdhsa_private_segment_fixed_size 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:16:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_x 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:17:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_y 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:18:3: error: unknown directive
  .amdhsa_system_sgpr_workgroup_id_z 1
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:19:3: error: unknown directive
  .amdhsa_system_vgpr_workitem_id 0
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:20:3: error: unknown directive
  .amdhsa_float_denorm_mode_32 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:21:3: error: unknown directive
  .amdhsa_float_denorm_mode_16_64 3
  ^
/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.s:22:1: error: unknown directive
.end_amdhsa_kernel
^
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.o'
clang++: error: no input files
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_1_gfx1012.o'
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/L_256_4_0_gfx1012.o'
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_S_256_4_gfx1012.o'
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_H_256_4_gfx1012.o'
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_H_S_256_4_gfx1012.o'
clang++: error: no such file or directory: '/root/rocm-build/hipblaslt/library/build_tmp/ops/A_S_H_256_4_gfx1012.o'
Error copying file "/root/rocm-build/hipblaslt/library/build_tmp/ops/extop_*.co" to "/root/rocm-build/hipblaslt/Tensile/library".
[2/34] Generating Tensile Libraries
FAILED: Tensile/library/TensileManifest.txt /root/rocm-build/hipblaslt/Tensile/library/TensileManifest.txt 
cd /root/rocm-build/hipblaslt/library && /root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/bin/TensileCreateLibrary --merge-files --separate-architectures --lazy-library-loading --no-short-file-names --no-library-print-debug --code-object-version=default --cxx-compiler=hipcc --library-format=msgpack --architecture=gfx1012 --build-id=sha1 /root/rocm-6.2/hipBLASLt/library/src/amd_detail/rocblaslt/src/Tensile/Logic/asm_full /root/rocm-build/hipblaslt/Tensile HIP

################################################################################
# Tensile Create Library
# Detected local GPU with ISA: gfx1012
# Detected local GPU with ISA: gfx1012
               cap gfx000 gfx803 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942 gfx1010 gfx1011 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 
   HasMFMA_bf16_1k      0      0      0      0      0      1      1      1      1       0       0       0       0       0       0       0 
        HasAddLshl      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
      HasAtomicAdd      0      0      0      0      1      1      1      1      1       0       0       0       0       1       1       1 
    HasDirectToLds      0      0      0      0      0      0      0      0      0       0       0       0       0       0       0       0 
     HasExplicitCO      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
     HasExplicitNC      0      0      0      0      0      0      0      0      0       1       1       1       1       1       1       1 
    HasGLCModifier      0      1      1      1      1      1      0      0      0       1       1       1       1       1       1       1 
         HasLshlOr      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
           HasMFMA      0      0      0      0      1      1      1      1      1       0       0       0       0       0       0       0 
     HasNTModifier      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
          HasSMFMA      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
         HasSMulHi      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
           HasWMMA      0      0      0      0      0      0      0      0      0       0       0       0       0       1       1       1 
        MaxLgkmcnt      1      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
          MaxVmcnt      0      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
      SupportedISA      0      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
   SupportedSource      1      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
        HasMFMA_b8      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
  HasMFMA_explictB      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
    v_dot2_f32_f16      0      0      0      1      1      1      1      1      1       0       1       1       1       1       1       1 
   v_dot2c_f32_f16      0      0      0      0      1      1      1      1      1       0       1       1       1       1       1       1 
         v_fma_f16      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
        v_fmac_f16      0      0      0      0      0      0      0      0      0       0       0       0       0       0       0       0 
         v_mac_f16      0      1      1      1      1      1      1      1      1       0       0       0       0       0       0       0 
      v_pk_fma_f16      0      0      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
     v_pk_fmac_f16      0      0      0      0      0      0      0      0      0       0       0       0       0       0       0       0 
         v_fma_f32      0      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
     v_fma_mix_f32      0      0      0      1      1      1      1      1      1       1       1       1       1       1       1       1 
        v_fmac_f32      0      0      0      1      1      1      1      1      1       1       1       1       1       1       1       1 
         v_mac_f32      0      1      1      1      1      1      0      0      0       1       1       1       0       0       0       0 
     v_mad_mix_f32      0      0      1      0      0      0      0      0      0       0       0       0       0       0       0       0 
      v_pk_add_f32      0      0      0      0      0      1      1      1      1       0       0       0       0       0       0       0 
       HasMFMA_f64      0      0      0      0      0      1      1      1      1       0       0       0       0       0       0       0 
         v_fma_f64      0      1      1      1      1      1      1      1      1       1       1       1       1       1       1       1 
        HasMFMA_f8      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
 VOP3v_dot4_i32_i8      0      0      0      1      1      1      1      1      1       0       1       1       1       0       0       0 
     v_dot4_i32_i8      0      0      0      0      0      0      0      0      0       0       0       0       0       0       0       0 
    v_dot4c_i32_i8      0      0      0      0      1      1      1      1      1       0       1       1       1       0       0       0 
      HasMFMA_xf32      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
ArchAccUnifiedRegs      0      0      0      0      0      1      1      1      1       0       0       0       0       0       0       0 
    CMPXWritesSGPR      1      1      1      1      1      1      1      1      1       0       0       0       0       0       0       0 
     ForceStoreSC1      0      0      0      0      0      0      1      1      0       0       0       0       0       0       0       0 
          HasAccCD      0      0      0      0      0      1      1      1      1       0       0       0       0       0       0       0 
        HasEccHalf      0      0      0      1      1      1      1      1      1       0       0       0       0       0       0       0 
         HasWave32      0      0      0      0      0      0      0      0      0       1       1       1       1       1       1       1 
          SDWAWait      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
     SeparateVscnt      0      0      0      0      0      0      0      0      0       1       1       1       1       1       1       1 
       TransOpWait      0      0      0      0      0      0      1      1      1       0       0       0       0       0       0       0 
          VgprBank      0      0      0      0      0      0      0      0      0       1       1       1       1       1       1       1 
  Waitcnt0Disabled      0      0      0      0      1      1      1      1      1       0       0       0       0       0       0       0 
# Found  hipcc version 6.2.41133-dd7f95766
Tensile::WARNING: Global parameter AsmDebug = False unrecognised.
# CodeObjectVersion from TensileCreateLibrary: default
# CxxCompiler       from TensileCreateLibrary: hipcc
# Architecture      from TensileCreateLibrary: gfx1012
# LibraryFormat     from TensileCreateLibrary: msgpack
# LibraryLogicFiles:
Reading logic files: Launching 16 threads...
Reading logic files: Done. (0.0 secs elapsed)
# Writing Custom CMake
# Writing Kernels...
Generating kernels: Launching 16 threads...
Generating kernels: Done. (0.0 secs elapsed)
*
Compiling source kernels: Launching 16 threads...
Compiling source kernels: Done. (1.5 secs elapsed)
Traceback (most recent call last):
  File "/root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/bin/TensileCreateLibrary", line 43, in <module>
    TensileCreateLibrary()
  File "/root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/TensileCreateLibrary.py", line 60, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/root/rocm-build/hipblaslt/virtualenv/lib/python3.12/site-packages/Tensile/TensileCreateLibrary.py", line 1471, in TensileCreateLibrary
    theMasterLibrary = list(masterLibraries.values())[0]
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
ninja: build stopped: subcommand failed.
AngryLoki commented 1 month ago

@TheTrustedComputer, if I read code correctly, hipBLASLt and rocWMMA are tied to either mfma (gfx9) or wmma (gfx11) instruction set.

You can either build hipBLASLt with -DAMDGPU_TARGETS="gfx940" (any random gpu architecture, slow), or apply https://github.com/gentoo/gentoo/blob/326a55e315aadb737f143269ef7491e1240f3d2a/sci-libs/hipBLASLt/files/hipBLASLt-6.1.1-no-arch.patch and build with -DBUILD_WITH_TENSILE=OFF -DAMDGPU_TARGETS="" (quick). Pytorch has hard linking dependency on hipBLASLt, but if GPU is not supported, it uses old hipBLAS code.

TheTrustedComputer commented 1 month ago

@AngryLoki Do you mean any supported GPU architecture? I built it for mine,-DAMDGPU_TARGETS=gfx1012, which resulted in the error I posted above.

I also appreciate your clarification regarding PyTorch's hipBLASLt requirement. PyTorch has an environment variable TORCH_BLAS_PREFER_HIPBLASLT to use hipBLASLt (1) or hipBLAS (0). I've set it to zero in my case to enforce this.

Gentoo's patch of hipBLASLt as a dummy library is an interesting workaround; I'll probably give that a try. Thanks!

AngryLoki commented 1 month ago

@TheTrustedComputer , build for random supported architecture (e. g. gfx940). Pytorch will attempt to load hipblaslt, it will discover that in was not compiled for your current GPU (and it is technically impossible to compile it) and it will automatically fallback to old hipBLAS code (used in pytorch-2.2.2). There is no need to set TORCH_BLAS_PREFER_HIPBLASLT=0: at some point in the past in nightly builds before 2.4.0 it was required due to bug (https://github.com/pytorch/pytorch/issues/119081#issuecomment-2166504992) but in 2.4.0 this bug was fixed and fallback works automatically.