hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
Local ROCm version: 5.2.5.1
hipBLASLt version used in build: release/rocm-rel-5.5
Python version: 3.10
CPU: POWER9
GPU: gfx906
The hipBLASLt requirement arose for us re: bitsandbytes-rocm/ops.cu:400 that is required for 8-bit loading of HuggingFace language models. Unfortunately, the current implementation seems to rely on hipBLASLt for 8-bit matmul, and lacks in 4-bit implementation. Would you say that for gfx906/gfx908, hipBLASLt provides an advantage in 8-bit or 4-bit inference compared to hipBLAS code?
During the build process, the following commands were used:
CMake did not report any errors. However, the build failed at the "Generating Tensile Libraries" target, immediately after displaying the message "Reading logic files: Launching 32 threads...". The build failure persists even when configuring using install.sh with AMDGPU_TARGETS hardcoded to gfx906.
Local ROCm version: 5.2.5.1 hipBLASLt version used in build: release/rocm-rel-5.5 Python version: 3.10 CPU: POWER9 GPU: gfx906
The hipBLASLt requirement arose for us re: bitsandbytes-rocm/ops.cu:400 that is required for 8-bit loading of HuggingFace language models. Unfortunately, the current implementation seems to rely on hipBLASLt for 8-bit matmul, and lacks in 4-bit implementation. Would you say that for gfx906/gfx908, hipBLASLt provides an advantage in 8-bit or 4-bit inference compared to hipBLAS code?
During the build process, the following commands were used:
CMake did not report any errors. However, the build failed at the "Generating Tensile Libraries" target, immediately after displaying the message "Reading logic files: Launching 32 threads...". The build failure persists even when configuring using install.sh with
AMDGPU_TARGETS
hardcoded togfx906
.traceback:
cmake.log make.log
rocminfo: rocminfo.txt
Update: Seems the same error appears when compiling with ROCm 5.5.