ROCm / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
17 stars 7 forks source link

Fails to build for W7900 (RDNA3, navi31, gfx1100) #9

Open lhl opened 2 months ago

lhl commented 2 months ago

I am running a standard Ubuntu 22.04 LTS ROCm 6.0.0 build with the latest packages in a new mamba venv.

Trying to install the package from source:

pip wheel -v --no-build-isolation git+https://github.com/ROCm/xformers.git@main#egg=xformers

It looks like there are a problem with a number of the kernels...

18:

  [18/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404b[2806/5041]559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_256.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_256.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP
16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_
EXP2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdc
pp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc                                                                            
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_no_causalmask_no_attnbias_maxk_256.o                                                                                                                                            
  /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xfor
mers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/tmp/pip-w
heel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include -I/h
ome/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/TH -I/
home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.0.0/in
clude -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tile
d_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_256.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/cs
rc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_256.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__
HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_EXP2=1 -f
gpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DP
YBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_n
o_attnbias_maxk_256.hip:10:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/ck_tiled_fmha_batched_forward_hip.h:23:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pi
peline_qr_ks_vs_hip.hpp:19:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pi
peline_qr_ks_vs_default_policy_hip.hpp:7:
  /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pipeline_qx_ks_vs_custom
_policy_hip.hpp:734:13: error: static assertion failed due to requirement 'kKPack % K3 == 0'
              static_assert(kKPack % K3 == 0);
              ^             ~~~~~~~~~~~~~~~~
  /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pipeline_qr_ks_vs_hip.hp
p:204:47: note: in instantiation of function template specialization 'ck::tile_program::block::BlockFmhaPipelineQXKSVSCustomPolicy<true, false, false, 1, 1>::MakeVDramTileDistribution
<ck::tile_program::block::BlockFmhaPipelineProblem<unsigned short, unsigned short, unsigned short, float, float, unsigned short, float, unsigned short, float, unsigned short, FmhaFwdS
hape<256>, false, ck::tile_program::block::GenericAttentionMask<true, true>, ck::tile_program::TileFmhaTraits<true, true, true, true, false, true, 1>>>' requested here
                               Policy::template MakeVDramTileDistribution<Problem>());
                                                ^
...
  fatal error: too many errors emitted, stopping now [-ferror-limit=]
  20 errors generated when compiling for gfx1100

19:

 [19/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad
559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inc
lude -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm
-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_with_causalmask_no_attnbias_maxk_256.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/
xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_with_causalmask_no_attnbias_maxk_256.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HA
S_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_F
AST_EXP2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_lib
stdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_with_causalmask_no_attnbias_maxk_256.o 
  /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xfor
mers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/tmp/pip-w
heel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include -I/h
ome/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/TH -I/
home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.0.0/in
clude -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tile
d_fmha_batched_forward_bp16_with_causalmask_no_attnbias_maxk_256.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/
csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_with_causalmask_no_attnbias_maxk_256.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 
-D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_EXP2=
1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' 
'-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_with_causalmask
_no_attnbias_maxk_256.hip:10:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/ck_tiled_fmha_batched_forward_hip.h:23:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pi
peline_qr_ks_vs_hip.hpp:19:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pi
peline_qr_ks_vs_default_policy_hip.hpp:7:
  /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pipeline_qx_ks_vs_custom
_policy_hip.hpp:734:13: error: static assertion failed due to requirement 'kKPack % K3 == 0'
              static_assert(kKPack % K3 == 0);
              ^             ~~~~~~~~~~~~~~~~
  /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pipeline_qr_ks_vs_hip.hp
p:204:47: note: in instantiation of function template specialization 'ck::tile_program::block::BlockFmhaPipelineQXKSVSCustomPolicy<true, false, false, 1, 1>::MakeVDramTileDistribution
<ck::tile_program::block::BlockFmhaPipelineProblem<unsigned short, unsigned short, unsigned short, float, float, unsigned short, float, unsigned short, float, unsigned short, FmhaFwdS
hape<256>, false, ck::tile_program::block::GenericAttentionMask<true, true>, ck::tile_program::TileFmhaTraits<true, true, true, true, false, true, 1>>>' requested here
                               Policy::template MakeVDramTileDistribution<Problem>());
                                                ^
...
  fatal error: too many errors emitted, stopping now [-ferror-limit=]
  20 errors generated when compiling for gfx1100.

20:

  [20/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad
559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inc
lude -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm
-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_no_causalmask_with_attnbias_maxk_256.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/
xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_with_attnbias_maxk_256.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HA
S_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_F
AST_EXP2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_lib
stdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_no_causalmask_with_attnbias_maxk_256.o 
...

22:

  [22/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad
559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inc
lude -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm
-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_64.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xfo
rmers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_64.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16
=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_EX
P2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp
"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_no_causalmask_no_attnbias_maxk_64.o
  /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xfor
mers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/tmp/pip-w
heel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include -I/h
ome/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/TH -I/
home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.0.0/in
clude -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tile
d_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_64.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csr
c/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_64.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HI
P_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_EXP2=1 -fgp
u-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYB
IND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_n
o_attnbias_maxk_64.hip:10:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/ck_tiled_fmha_batched_forward_hip.h:23:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_pi
peline_qr_ks_vs_hip.hpp:18:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/warp_tile/warp_gemm_hip.hpp:10:
  In file included from /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/warp_tile/warp_gemm_attribute_mfm
a_hip.hpp:13:
  /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include/ck/tile_program/warp_tile/warp_gemm_attribute_mfma_impl_hip.hpp:116:17:
 error: '__builtin_amdgcn_mfma_f32_32x32x8bf16_1k' needs target feature mai-insts
          c_vec = __builtin_amdgcn_mfma_f32_32x32x8bf16_1k(a_vec, b_vec, c_vec, 0, 0, 0);
                  ^
  1 error generated when compiling for gfx1100.

23:

  [23/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad
559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inc
lude -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm
-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_no_causalmask_with_attnbias_maxk_64.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/x
formers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_with_attnbias_maxk_64.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_
FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAS
T_EXP2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libst
dcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_no_causalmask_with_attnbias_maxk_64.o

24:

  [24/156] /opt/rocm-6.0.0/bin/hipcc  -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad
559c/xformers/csrc/attention/hip_fmha -I/tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/example/91_tile_program/xformers_fmha -I/
tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/third_party/composable_kernel_tiled/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inc
lude -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/inclu
de/TH -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THC -I/home/lhl/miniforge3/envs/axolotl/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm
-6.0.0/include -I/home/lhl/miniforge3/envs/axolotl/include/python3.11 -c -c /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/xformers/csrc/attention/hip_fmha/instance
s/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_32.hip -o /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xfo
rmers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forward_bp16_no_causalmask_no_attnbias_maxk_32.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16
=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 --offload-arch=native -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -DCK_FMHA_FWD_FAST_EX
P2=1 -fgpu-flush-denormals-to-zero -Werror -Woverloaded-virtual -DBUILD_PYTHON_PACKAGE -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp
"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
  FAILED: /tmp/pip-wheel-wte6d_sf/xformers_c82a31b9e7ad4404bed98720a3ad559c/build/temp.linux-x86_64-cpython-311/xformers/csrc/attention/hip_fmha/instances/ck_tiled_fmha_batched_forwar
d_bp16_no_causalmask_no_attnbias_maxk_32.o

Actually it keeps going for 25, 26, 27, 28, 29, 30, 31, but you get the idea.

qianfengz commented 2 months ago

Try the develop branch

qianfengz commented 2 months ago

Also, composable_kernel_tiled is not supported on Navi31 at present

lhl commented 2 months ago

Try the develop branch

I've confirmed that the develop branch doesn't work either. Not surprising if the CK code it depends on doesn't work?

I'll leave this issue open to track if there's any future progress. Many libs/frameworks depend on xformers so any version that works with navi31 would be of wider interest.

lukedupin commented 1 month ago

I tried on Arch, 7900xt, also getting lots of errors. Hundreds of errors but most look like the following:

                                             Tuple<Sequence<N0, N1>, Sequence<K0, K1, K2, K3>>,
                                                                                  ^~
third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_bwd_pipeline_default_policy_hip.hpp:1016:81: note: initializer of 'K1' is not a constant expression
third_party/composable_kernel_tiled/include/ck/tile_program/block_tile_pipeline/block_fmha_bwd_pipeline_default_policy_hip.hpp:1011:27: note: declared here
          constexpr index_t K1 = get_warp_size() / (K2 * N0);
  4 warnings and 14 errors generated when compiling for gfx1100.

Here's my full run:

python -m venv venv
source venv/bin/activate
pip install --upgrade pip

pip3 install --ignore-installed --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1
pip install setuptools
pip install wheel
pip wheel -v --no-build-isolation "git+https://github.com/ROCm/xformers.git@develop#egg=xformers"  
lukedupin commented 1 month ago

I tried out building this lib with the docker option, and everything worked fine. Is this an indication that I have a library issue with my setup and it would work with gfx1100?