ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
40 stars 56 forks source link

Build Failure During Tensile Libraries Generation #115

Open hovertank3d opened 1 year ago

hovertank3d commented 1 year ago

Local ROCm version: 5.2.5.1 hipBLASLt version used in build: release/rocm-rel-5.5 Python version: 3.10 CPU: POWER9 GPU: gfx906

The hipBLASLt requirement arose for us re: bitsandbytes-rocm/ops.cu:400 that is required for 8-bit loading of HuggingFace language models. Unfortunately, the current implementation seems to rely on hipBLASLt for 8-bit matmul, and lacks in 4-bit implementation. Would you say that for gfx906/gfx908, hipBLASLt provides an advantage in 8-bit or 4-bit inference compared to hipBLAS code?

During the build process, the following commands were used:

CMake command: `cmake -DAMDGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_C_COMPILER=hipcc -G "Unix Makefiles" ..`
Make command: `make -j16`

CMake did not report any errors. However, the build failed at the "Generating Tensile Libraries" target, immediately after displaying the message "Reading logic files: Launching 32 threads...". The build failure persists even when configuring using install.sh with AMDGPU_TARGETS hardcoded to gfx906.

traceback:

cmake.log make.log

rocminfo: rocminfo.txt

Update: Seems the same error appears when compiling with ROCm 5.5.

jichangjichang commented 1 year ago

@hovertank3d hipBLASLt only support gfx90a so far. You can find the supported data types and hw requirement from Readme.