Closed Zakhrov closed 1 month ago
This seems to be an #include
related problem. Can you give https://github.com/ROCm/aotriton/commit/0873896ab690d5767975f2bb9ab850b1a103b26e a try?
Another problem is Navi (aka RDNA) GPUs are not supported by this project yet. The only supported architectures are MI200/MI300 (gfx90a/gf942), aka CDNA 2/3 GPUs. See answers in #16
We are going to add Navi supports once the Triton compiler support it.
From what I have found out, only gfx1100 supports the WMMA intrinsics, can we make aotriton respect the PYTORCH_ROCM_ARCH
variable to skip compilation alltogether?
Removing the if guard for AOTRITON_USE_ZSTD worked. But it takes a really long time to build the HIP kernels. I think a more elegant target handling solution (like the one I mentioned above) would help with reducing the build times, particularly when debugging. Also the HIP kernels were built with my native offload-arch (gfx1010) instead of with offload-arch=gfx90a
or offload-arch=gfx942
With Rocm 6.2, it fails to build with:
FAILED: v2src/libaotriton_v2.so
: && hipcc -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libaotriton_v2.so -o v2src/libaotriton_v2.so @CMakeFiles/aotriton_v2.rsp && :
ld.lld: error: /lib/libgcc_s.so.1 is incompatible with elf64-x86-64
ld.lld: error: /lib/libgcc_s.so.1 is incompatible with elf64-x86-64
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++ --driver-mode=g++ --hip-link -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libaotriton_v2.so -o "v2src/libaotriton_v2.so" \@CMakeFiles/aotriton_v2.rsp
ninja: build stopped: subcommand failed.
You probably want to double check your compiler or system environment, /lib/libgcc_s.so.1
is a 32bit library and should not present on any modern system.
I removed the 32bit version of libgcc_s and I now get this error:
FAILED: v2src/libaotriton_v2.so
: && hipcc -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libaotriton_v2.so -o v2src/libaotriton_v2.so @CMakeFiles/aotriton_v2.rsp && :
ld.lld: error: /usr/lib64/gcc/x86_64-suse-linux/13/libgcc_s.so:4: unable to find libgcc_s.so.1
>>> GROUP ( libgcc_s.so.1 -lgcc )
>>> ^
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
failed to execute:/opt/rocm-6.2.0/lib/llvm/bin/clang++ --driver-mode=g++ --hip-link -fPIC -O3 -DNDEBUG -shared -Wl,-soname,libaotriton_v2.so -o "v2src/libaotriton_v2.so" \@CMakeFiles/aotriton_v2.rsp
ninja: build stopped: subcommand failed.
Also, overriding the compiler by using CC=clang CXX=clang++ doesn't work because clang complains about variable length arrays
It looks like it is a problem with rocm's LLVM linker, which seems to not respect LD_LIBRARY_PATH, and it seems to not skip incompatible libraries.
which seems to not respect LD_LIBRARY_PATH
This only has top priority as a runtime env var, for ld
its precedence is after -rpath-link
or -rpath
options. See -rpath-link=
section from https://man7.org/linux/man-pages/man1/ld.1.html for more details.
Also you may want to use container for your build system. The closest public available image is docker pull rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0
Problem Description
Pytorch fails to compile locally with aotriton, and throws the following error:
This happens even when setting the
USE_FLASH_ATTENTION
option to OFFOperating System
openSUSE Leap 15.5
CPU
AMD Ryzen 5 4600H
GPU
AMD Radeon Pro VII
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
try to build Pytorch with
PYTORCH_ROCM_ARCH=gfx1010 USE_FLASH_ATTENTION=OFF USE_ROCM=ON ROCM_PATH=/opt/rocm python3 setup.py develop
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
No response