Build Failure During Tensile Libraries Generation

Local ROCm version: 5.2.5.1 hipBLASLt version used in build: release/rocm-rel-5.5 Python version: 3.10 CPU: POWER9 GPU: gfx906

The hipBLASLt requirement arose for us re: bitsandbytes-rocm/ops.cu:400 that is required for 8-bit loading of HuggingFace language models. Unfortunately, the current implementation seems to rely on hipBLASLt for 8-bit matmul, and lacks in 4-bit implementation. Would you say that for gfx906/gfx908, hipBLASLt provides an advantage in 8-bit or 4-bit inference compared to hipBLAS code?

During the build process, the following commands were used:

CMake command: `cmake -DAMDGPU_TARGETS=gfx906 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_C_COMPILER=hipcc -G "Unix Makefiles" ..`
Make command: `make -j16`

CMake did not report any errors. However, the build failed at the "Generating Tensile Libraries" target, immediately after displaying the message "Reading logic files: Launching 32 threads...". The build failure persists even when configuring using install.sh with AMDGPU_TARGETS hardcoded to gfx906.

traceback:

cmake.log make.log

rocminfo: rocminfo.txt

Update: Seems the same error appears when compiling with ROCm 5.5.

ROCm / hipBLASLt

Build Failure During Tensile Libraries Generation #115